Backup hangs at "Connecting for filelist (async)..."

Server: 2.4.13 running on Debian 11.0, with ext4 for root/var and btrfs for backup volume
Client: 2.4.11 running on Debian 10.10, with ext4

Backups on other clients (Linux Mint, Raspbian) are running fine

Problem popped up suddenly a few days ago.
I cannot relate it to any update, whether on the server nor on the client.

I have rebooted both server and client.

On the server GUI, the activity log shows:

08/26/21 10:37  	DEBUG  	Reflink copying is enabled
08/26/21 10:37  	DEBUG  	Reflink copying is enabled
08/26/21 10:37  	INFO  	Starting scheduled incremental file backup...
08/26/21 10:37  	DEBUG  	gandalf: Connecting for filelist...
08/26/21 10:37  	DEBUG  	gandalf: Waiting for filelist
08/26/21 10:37  	DEBUG  	gandalf: Connecting for filelist (async)... 

I would be happy for any clue about how to fix it…

urbackupclient.log.gz (16,1 Ko)

2021-08-26 10:37:32: Creating shadowcopy of “var” in indexDirs()

This seems to be the last message w.r.t. indexing. Maybe check (e.g. in htop in process tree view). If the snapshot creation script hangs? Are you using a snapshot script?

I am using the default snapshots of urbackup:

root@gandalf:~# cat /usr/local/etc/urbackup/snapshot.cfg 
#This is a key=value config file for determining the scripts/programs to create snapshots

create_filesystem_snapshot=/usr/local/share/urbackup/lvm_create_filesystem_snapshot
remove_filesystem_snapshot=/usr/local/share/urbackup/lvm_remove_filesystem_snapshot
root@gandalf:~# 

And there is a script trying to create a snapshot of /var, active since several hours:

root@gandalf:~# ps -ef | grep u[r]backup
root     19668     1  0 10:36 ?        00:01:17 /usr/local/sbin/urbackupclientbackend --config /etc/default/urbackupclient --no-consoletime
root     19798 19668  0 10:37 ?        00:00:00 sh -c /usr/local/share/urbackup/lvm_create_filesystem_snapshot 29e632cac36a654181982568d951892c1cf4f7e1b797c94e "/var" "var" "/var" 2>&1
root     19799 19798  0 10:37 ?        00:00:00 /bin/sh /usr/local/share/urbackup/lvm_create_filesystem_snapshot 29e632cac36a654181982568d951892c1cf4f7e1b797c94e /var var /var
root     19809 19799  0 10:37 ?        00:00:00 /bin/sh /usr/local/share/urbackup/lvm_create_filesystem_snapshot 29e632cac36a654181982568d951892c1cf4f7e1b797c94e /var var /var
root@gandalf:~#

I can manually create (and remove a snapshot of /var):

root@gandalf:~# lvcreate -l50%FREE -s -n urbackup_test rootvg/varvol
  Reducing COW size 209.75 GiB down to maximum usable size 20.08 GiB.
  Logical volume "urbackup_test" created.
root@gandalf:~# lvremove rootvg/urbackup_test
Do you really want to remove active logical volume rootvg/urbackup_test? [y/n]: y
  Logical volume "urbackup_test" successfully removed
root@gandalf:~#

I can try to stop the client and run manually the snapshot script, in debug mode to see what’s happening.

Shall I?

Problem fixed.
It was a hanging NFS mount which was blocking the “df” command.
I detected it by running the lvm_create_filesystem_snapshot script in debug mode.

Thanks for the hint and for the good software :slight_smile: