Backup stuck in postgresql wal backup

I’ve just received a warning that a server wasn’t backuped up for a while. So I went checking. As it seems it was stuck. It showed me that it needed 292487235 years 4 months 2 weeks 1 day 8 hours 10 minutes longer to finish the backup. While I marvel the exactitude it’s just a tiny bit longer than I’m ready to wait…

  • Urbackup server 2.14.3 on debian stable
  • urbackup client 2.4.9 on debian stable
  • I’ve checked the client logs: => Nothing, neither in urbackup.log nor syslog
  • Server logs: nothing, neither for urbackup nor syslog

The backup was stuck at:

|03/03/21 10:26 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07701 MB at 0 Bit/s|
|—|---|—|
|03/03/21 10:27 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:28 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:29 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:30 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:31 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:32 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:33 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:34 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:35 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:36 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07702 MB at 0 Bit/s|
|03/03/21 10:37 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:38 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:39 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:40 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:41 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:42 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:43 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:44 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|
|03/03/21 10:45 |DEBUG |Loading “urbackup/FILE_METADATA|B0eNBce3gVFDZbjr2QSD|58069”. Loaded 1.07703 MB at 0 Bit/s|

So I’ve just restarted the urbackupclient service. Result:

03/03/21 10:50 ERROR Error getting file metadata. Errorcode: CANNOT_OPEN_FILE (3)
03/03/21 10:50 DEBUG Loading file patch for “postgresqldump.sql”
03/03/21 10:50 ERROR Error getting file patch for "SCRIPT
03/03/21 10:50 INFO Hint: Current root backup path does not exist anymore. Usually this is because the current file system snapshot it was backing up was deleted because it ran out of snapshot storage space. See https://www.urbackup.org/faq.html#base_dir_lost for details and for how to fix this issue
03/03/21 10:50 INFO Saving incomplete file. (2)
03/03/21 10:50 ERROR Client example.com went offline.
03/03/21 10:50 INFO Waiting for file hashing and copying threads…
03/03/21 10:51 DEBUG Saved metadata of 6922 files and directories. 99% done…
03/03/21 10:51 INFO Writing new file list…
03/03/21 10:51 INFO All metadata was present
03/03/21 10:51 INFO Number of re-added file entries is 80
03/03/21 10:51 DEBUG Client disconnected while backing up. Copying partial file…
03/03/21 10:51 DEBUG Syncing file system…
03/03/21 10:51 INFO Transferred 2.41793 GB - Average speed: 395.352 KBit/s
03/03/21 10:51 INFO (Before compression: 3.02097 GB ratio: 1.2494)
03/03/21 10:51 INFO 736.732 MB of files were already present on the server and did not need to be transferred
03/03/21 10:51 DEBUG Script does not exist urbackup/post_incr_filebackup
03/03/21 10:51 INFO Time taken for backing up client example.com: 14h 37m 28s
03/03/21 10:51 ERROR Backup failed

As it seems it was stuck in the postgres wal backup. I’m using the default scripts which come with urbackup.
There definitely is enough disk space left on both servers. It does not create a snapshots of the volumes on the client.
After restarting the client I’ve just started a new incremental backup and after a few minutes it finished and everything worked perfectly.

Maybe there should be some security, so backup kill themselves if they are stuck for too long and show ridiculous high finish estimations? (Besides finding the cause of this bug).

Hmm, either a bug, or the postgres dump hung? Next time have a look at if the postgresdump script is still active or enable client debug logging?

If it is a hanging postgresdump, the solution would be to add a timeout into that script, I guess.

I’m doing that backup since a year, and it’s been the first time that happened. So I can’t guarantee when I’d be able to see the behavior again…
When the backup was stuck, I didn’t realize it was stuck in the postgres backup, since the log didn’t indicate anything like that. And when I saw the log output after restarting the client it was obviously too late to do any checks. And I couldn’t find any anomalies in the postgres log.