Hanging backup jobs

I’ve noticed that sometimes when my laptop (backup client) goes into standby mode and a backup job was running the backup job seems to hang forever when I resume the Laptop a bit later (I’m not sure about the exact timing. Could be 10-90 minutes later).
That’s what happened last time that it happened:

  • Backup job was running. Switching laptop into stand by mode.
  • Some time later (10-90 minutes???) I’ve resumed the laptop
  • At some point, maybe 10-30 minutes after resuming, I’ve checked some stuff on the urbackup server and noticed the backup job from my laptop, which didn’t seem to get anywhere.
  • checking the log on the server side, the last entries where from just before standby
  • checking urbackupclientctl: it shows that it is connected and a backup is running, no log entries.
  • I’ve waited for about an hour, but nothing changed.
  • Then I’ve stopped the backup on the server side.
  • I’ve waited for about another hour, but nothing happened, neither on server nor on client side.
  • So I’ve restartet urbackupclientbackend on the laptop.
  • Now urbackupclientctl showed that it was connected to the server, but no backup is ongoing
  • Meanwhile the server continued the backup. There were new log entries of him copying files. After a few minutes (not more than 10) he noticed that the client was disappeared (restarted) at some point of time and printed out following:

17/09/19 11:17 DEBUG GT: Linked file “frachten.ibd”
17/09/19 11:17 DEBUG HT: Copying file to “/var/lib/urbackup/RMMbook/190917-1652/.hashes/mysql/fisch_web/frachtencodetabelle.ibd”
17/09/19 11:17 DEBUG GT: Linked file “frachtencodetabelle.ibd”
17/09/19 11:17 DEBUG HT: Copying file to “/var/lib/urbackup/RMMbook/190917-1652/.hashes/mysql/fisch_web/gapless_sequence.frm”
17/09/19 11:17 DEBUG GT: Linked file “gapless_sequence.frm”
17/09/19 11:17 DEBUG Copying incomplete file “tables_priv.MYI”
17/09/19 11:17 DEBUG Copying incomplete file “tables_priv.frm”
17/09/19 11:17 DEBUG Copying incomplete file “user.MYI”
17/09/19 11:17 DEBUG Copying incomplete file “user.frm”
17/09/19 11:17 INFO Waiting for file hashing and copying threads…
17/09/19 11:17 INFO Waiting for metadata download stream to finish
17/09/19 11:17 DEBUG Saved metadata of 232 files and directories. 100% done…
17/09/19 11:17 DEBUG Not all folder metadata could be applied. Metadata was inconsistent.
17/09/19 11:17 INFO Writing new file list…
17/09/19 11:17 DEBUG Some metadata was missing
17/09/19 11:17 INFO Number of copied file entries from last backup is 286823
17/09/19 11:17 DEBUG Client disconnected while backing up. Copying partial file…
17/09/19 11:17 DEBUG Syncing file system…
17/09/19 11:17 INFO Transferred 15.5765 MB - Average speed: 27.408 KBit/s
17/09/19 11:17 INFO (Before compression: 23.7818 MB ratio: 1.52678)
17/09/19 11:17 INFO 101.366 GB of files were already present on the server and did not need to be transferred
17/09/19 11:17 DEBUG Script does not exist urbackup/post_incr_filebackup
17/09/19 11:17 INFO Time taken for backing up client RMMbook: 1h 25m 11s
17/09/19 11:17 ERROR Backup failed

I’ve had similar issues before, but just ignored them.
There was one, of which I don’t know if it is related.

  • A backup from a server was hanging for over 24 hours. Just doing nothing (no new log entries on the server or client).
  • I’ve tried to stop the backup on the server side. After 24 hours it was still there.
  • I’ve stopped the urbackupclientbackend on the client side. After 24 hours the backup was still there.
  • I’ve started the urbackupclientbackend again and waited an other 3 days (I’ve forgot about it ;-)).
  • The backupjob was still there. There was a virtual client on the same machine which meanwhile did all it’s backups without any problems. Meanwhile the main client couldn’t start any new backups, since the other one was hanging.
  • I’ve stopped the urbackupclientbackend on the client side.
  • Stopped the urbackup server
  • Did a clean up (no idea if that is necessary)
  • started the urbackup server again
  • started urbackupclientbackend again
  • The backupjob was finally gone and everything worked normally again.

I don’t know if the two behaviour above are related. It was the only time I can remember that I’ve had to restart the urbackupserver for a backupjob to disappear. Usually restarting the urbackupclientbackup is enough…

Both server and clients are running on Linux: Debian buster. Both use the actual urbackup version 2.3.8 for the server and 2.3.4 for the clients.
My laptop uses: “Beta: Calculate file hashes on client in parallel”
The other client machine where the backup was stuck for 5 days, doesn’t use that. That one is a server, so it’s online 24/7.
I know it’s not the best error description ever, but I’ve really got the feeling there’s a problem somewhere. I’d be happy to try to recreate the problem and provide logs, if it’s possible to get meaning full ones.

That morning I’ve had a similar problem. The backups didn’t hang, but they aborted.

  • I’ve woke up my laptop from suspend mode
  • urbackup server tried several times to start an incremental backup on my laptopt, but always failed with the same message (see below).
  • I’ve restarted urbackupclientbackend
  • Now the server could start the backup without any problems.

Here’s the log from the server. The client didn’t write any log at all.

20/09/19 09:30 DEBUG Reflink copying is enabled
20/09/19 09:30 DEBUG Reflink copying is enabled
20/09/19 09:30 INFO Starting unscheduled incremental file backup…
20/09/19 09:30 DEBUG RMMbook: Doing backup with hashes…
20/09/19 09:30 DEBUG RMMbook: Doing backup with intra file diffs…
20/09/19 09:30 DEBUG RMMbook: Connecting for filelist…
20/09/19 09:30 DEBUG RMMbook: Waiting for filelist
20/09/19 09:30 DEBUG RMMbook: Connecting for filelist (async)…
20/09/19 09:31 INFO Waiting for parallel hash load stream to finish
20/09/19 09:33 ERROR Error during parallel hash load: TIMEOUT
20/09/19 09:35 ERROR Error starting parallel hash load
20/09/19 09:35 ERROR Backup had an early error. Deleting partial backup.

If you use the parallel hash loading, could you please use 2.4.x? There are several fixes in this area in this version, and it should be nearly ready for release https://forums.urbackup.org/c/testing

(if it still hangs, I’d need the full set of server+client debug logs + even memory dumps of the processes; the simple issues should be fixed now)