Today I found a hung incremental file backup of an old debian linux virtual machine.
Looking into this hang I found two errors. My guess is that both are related to system load.
- initially lvcreate hung
- later “Error starting file metadata download thread”
The metadata download thread problem happens occasionally and recovers eventually with retries. The lvcreate hang is more of a problem as it required attention to clear.
Do you have any advice about these errors? I have included some of my notes below:
On the server:
root@curlew:~# urbackupsrv --version
UrBackup Server v2.4.12.0
Copyright (C) 2011-2019 Martin Raiber
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
root@curlew:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
root@curlew:~# uname -a
Linux curlew 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux
On the client:
root@avocet:~# urbackupclientbackend --version
UrBackup Client Backend v2.4.9.0
Copyright (C) 2011-2019 Martin Raiber
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
root@avocet:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 8.11 (jessie)
Release: 8.11
Codename: jessie
root@avocet:~# uname -a
Linux avocet 3.16.0-10-amd64 #1 SMP Debian 3.16.81-1 (2020-01-17) x86_64 GNU/Linux
The web gui log for the machine showed:
14/02/20 02:50 INFO Waiting for file hashing and copying threads...
14/02/20 02:50 INFO Writing new file list...
14/02/20 02:50 INFO All metadata was present
14/02/20 02:50 DEBUG Syncing file system...
14/02/20 02:51 DEBUG Creating symbolic links. -1
14/02/20 02:51 DEBUG Creating symbolic links. -2
14/02/20 02:51 DEBUG Symbolic links created.
14/02/20 02:51 INFO Transferred 64.6736 MB - Average speed: 9.74898 MBit/s
14/02/20 02:51 DEBUG Script does not exist urbackup/post_incr_filebackup
14/02/20 02:51 INFO Time taken for backing up client avocet: 2m 10s
14/02/20 02:51 INFO Backup succeeded
14/02/20 06:56 DEBUG avocet: Connecting for filelist...
14/02/20 06:56 DEBUG avocet: Waiting for filelist
14/02/20 06:56 DEBUG avocet: Connecting for filelist (async)...
On the machine being backed up I found
root@avocet:~# lvs
^C Interrupted...
Giving up waiting for lock.
/run/lock/lvm/V_vg0:aux: flock failed: Interrupted system call
Can't get lock for vg0
Skipping volume group vg0
root@avocet:~# lsof | grep V_vg0
dmeventd 2282 root 8uW REG 0,18 0 2171436 /run/lock/lvm/V_vg0:aux
dmeventd 2282 root 9u REG 0,18 0 2164498 /run/lock/lvm/V_vg0
dmeventd 2282 25459 root 8uW REG 0,18 0 2171436 /run/lock/lvm/V_vg0:aux
dmeventd 2282 25459 root 9u REG 0,18 0 2164498 /run/lock/lvm/V_vg0
dmeventd 2282 25465 root 8uW REG 0,18 0 2171436 /run/lock/lvm/V_vg0:aux
dmeventd 2282 25465 root 9u REG 0,18 0 2164498 /run/lock/lvm/V_vg0
lvcreate 25423 root 5uW REG 0,18 0 2164498 /run/lock/lvm/V_vg0
root@avocet:~# ps -p 25423 -f
UID PID PPID C STIME TTY TIME CMD
root 25423 25402 0 06:56 ? 00:00:01 lvcreate -l50%FREE -s -n urbackup_snap_8a15686e1ff0af5c33d27d36b0aba7ad7bb3d2e178bb26d2 /dev/mapper/vg0-root
root@avocet:~# ps -p 25402 -f
UID PID PPID C STIME TTY TIME CMD
root 25402 25401 0 06:56 ? 00:00:01 /bin/sh /usr/local/share/urbackup/lvm_create_filesystem_snapshot 8a15686e1ff0af5c33d27d36b0aba7ad7bb3d2e178bb26d2 / root /
root@avocet:~# ps -p 25401 -f
UID PID PPID C STIME TTY TIME CMD
root 25401 558 0 06:56 ? 00:00:00 sh -c /usr/local/share/urbackup/lvm_create_filesystem_snapshot 8a15686e1ff0af5c33d27d36b0aba7ad7bb3d2e178bb26d2 "/" "root" "/" 2>&1
root@avocet:~# ps -p 558 -f
UID PID PPID C STIME TTY TIME CMD
root 558 1 2 Feb11 ? 01:57:41 /usr/local/sbin/urbackupclientbackend --config /etc/default/urbackupclient --no-consoletime
I terminated the stuck lvcreate:
root@avocet:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root vg0 owi-aos--- 13.97g
swap vg0 -wi-ao---- 952.00m
urbackup_snap_8a15686e1ff0af5c33d27d36b0aba7ad7bb3d2e178bb26d2 vg0 swi-a-s--- 2.55g root 69.03
So the snapshot seems to have been created before the hang. The backup was still waiting so I stopped it via the Activities tab of the web gui. The log then showed:
14/02/20 11:07 WARNING Sever admin stopped backup during indexing
14/02/20 11:07 ERROR Backup had an early error. Deleting partial backup.
14/02/20 11:07 WARNING Exponential backoff: Waiting at least 40m before next file backup
When the backup started again later it got past the snapshot but failed later:
14/02/20 11:47 INFO Starting scheduled incremental file backup...
14/02/20 11:47 DEBUG avocet: Doing backup with hashes...
14/02/20 11:47 DEBUG avocet: Doing backup with intra file diffs...
14/02/20 11:47 DEBUG avocet: Connecting for filelist...
14/02/20 11:47 DEBUG avocet: Waiting for filelist
14/02/20 11:47 DEBUG avocet: Connecting for filelist (async)...
14/02/20 11:50 INFO Logical volume "urbackup_snap_e9dee62ea3cf7cc6d38d974bfe143b6fdb133e97179a6571" created
14/02/20 11:50 INFO Indexing of "root" done. 11626 filesystem lookups 0 db lookups and 0 db updates
14/02/20 11:50 INFO Backing up "shares_0" without snapshot.
14/02/20 11:50 INFO Indexing of "shares_0" done. 33604 filesystem lookups 0 db lookups and 0 db updates
14/02/20 11:50 DEBUG avocet: Doing backup with hashed transfer...
14/02/20 11:50 INFO avocet: Loading file list...
14/02/20 11:50 DEBUG avocet Starting incremental backup...
14/02/20 11:50 INFO avocet: Calculating file tree differences...
14/02/20 11:50 INFO Waiting for metadata download stream to finish
14/02/20 11:50 ERROR Error starting file metadata download thread
14/02/20 11:50 ERROR Backup had an early error. Deleting partial backup.
14/02/20 11:50 WARNING Exponential backoff: Waiting at least 1h 20m before next file backup
The client now showed:
root@avocet:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root vg0 owi-aos--- 13.97g
swap vg0 -wi-ao---- 952.00m
urbackup_snap_8a15686e1ff0af5c33d27d36b0aba7ad7bb3d2e178bb26d2 vg0 swi-a-s--- 2.55g root 72.44
urbackup_snap_e9dee62ea3cf7cc6d38d974bfe143b6fdb133e97179a6571 vg0 swi-aos--- 1.27g root 2.43