Urbackup Server on linux hangs during cleanup window

Hi,

I’m using Urbackup and I think that it’s a great backup system. Congratulations for your great work.

We have a problem with our backup server. Sometimes, during the cleanup window hours (at night 3-4 am), the Urbackup process hangs (the web interface and backups stop working), and nothing gets logged. When I restart the urbackup server daemon, the system works ok.

This server is using Ubuntu 16.04, the file system for the urbackup data directory is ZFS (ZFS on Linux), / is on ext4.

I’m sending a full debug log by email. This is a brief fragment of the log, at the restart:

2017-06-15 03:22:18: GT: Linked file “XXXXXXXXXXXXXXXXXXXXXX”
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:18: Flushing FileClient…
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:18: Loading file “XXXXXXXXXXXXXXXXXXXXXX” (metadata only)
2017-06-15 03:22:19: Downloading version file…
2017-06-15 03:22:19: Downloading version file…
2017-06-15 03:22:19: Downloading server version info…
2017-06-15 03:22:19: Downloading dataplan database…
2017-06-19 09:25:16: Starting HTTP-Server on port 55414
2017-06-19 09:25:16: HTTP: Server started up successfully!
2017-06-19 09:25:19: SQLite: recovered 14729 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2017-06-19 09:26:00: SQLite: recovered 234565 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2017-06-19 09:26:00: SQLite: recovered 33 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2017-06-19 09:26:00: SQLite: recovered 30 frames from WAL file /var/urbackup/backup_server_links.db-wal code: 283
2017-06-19 09:26:00: SQLite: recovered 122 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2017-06-19 09:26:01: SQLite: recovered 14729 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2017-06-19 09:26:01: SQLite: recovered 122 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2017-06-19 09:26:09: SQLite: recovered 234565 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2017-06-19 09:26:09: SQLite: recovered 30 frames from WAL file /var/urbackup/backup_server_links.db-wal code: 283
2017-06-19 09:26:09: SQLite: recovered 33 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2017-06-19 09:26:09: Started UrBackup…
2017-06-19 09:26:09: Testing if backup destination can handle subvolumes and snapshots…
2017-06-19 09:26:09: InternetService: Server started up successfully!
2017-06-19 09:26:09: UrBackup Server start up complete.
2017-06-19 09:26:09: Looking for old Sessions… 0 sessions
2017-06-19 09:26:09: Server started up successfully!

Please tell me if you need more information related to this problem.

Thank you very much, Best Regards,

Does dmesg show any errors?

Nothing at jun 15th, but the day before (jun 14 th), there was a segfault:

[lun jun 12 00:53:30 2017] fbackup main[24738]: segfault at 7eff25d93000 ip 00007eff2595f760 sp 00007efdc3fcd840 error 4 in libcrypto.so.1.0.0[7eff258f1000+21a000]
[mié jun 14 09:41:29 2017] urbackupsrv[31518]: segfault at 140c050 ip 000000000140c050 sp 00007f80a97f9d78 error 15

These are the two last lines of my dmesg, there are no more lines at the log.

Thank you, best regards

Yesterday, I moved the cleanup window from 3-4am to 7-8am. This morning has been working ok.

Can this issue be caused by server load? At 7am almost all backups ended.
I will check it tomorrow morning.

Thank you, best regards,

Unfortunately, there was no luck. This morning has happened again, at 7 am.

These are the last entries of the log:

2017-06-21 07:01:15: HT: Copying file: “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
2017-06-21 07:01:15: Time taken for backing up client Cooperativa Vitivinicola Virgen de las Nieves: 4m 20s
2017-06-21 07:01:15: Backup succeeded
2017-06-21 07:01:15: Updating statistics…
2017-06-21 07:01:15: HT: Copying file: “XXXXXXXXXXXXXXXXXXXXX”
2017-06-21 07:01:15: Writing new file list…
2017-06-21 07:01:16: Updating image stats…
2017-06-21 07:01:16: Updating file statistics…
2017-06-21 07:01:16: All metadata was present
2017-06-21 07:01:16: Number of re-added file entries is 256
2017-06-21 07:01:16: Syncing file system…
2017-06-21 07:01:17: Done updating statistics.
2017-06-21 07:01:17: Downloading version file…
2017-06-21 07:01:17: Creating symbolic links. -1
2017-06-21 07:01:17: Creating symbolic links. -2
2017-06-21 07:01:17: Symbolic links created.
2017-06-21 07:01:17: Transferred 1.06027 MB - Average speed: 48.384 KBit/s
2017-06-21 07:01:17: Script does not exist urbackup/post_incr_filebackup
2017-06-21 07:01:17: Downloading version file…
2017-06-21 07:01:17: Downloading version file…
2017-06-21 07:01:17: Downloading server version info…
2017-06-21 07:01:18: Downloading dataplan database…

I think that it hangs at the “Downloading dataplan database…” step.

Thank you, best regards,