I’m running out of things to try here and getting desperate.
I’ve been running an Ubuntu Server 20.04 LTS + UrBackup 2.5.33 on a bare metal for a very long time and it has been rock solid. Recently, due to some hard drive failures in my RAID (no actual system failure), I figured it’s time to upgrade. I heavily use virtual machines, so figured I’d leverage my VM infrastructure. So I created a VM, passed through a pair of 24TB hdds dedicated to the VM, installed Ubuntu 22.04 LTS + UrBackup 2.5.33. Virgin Ubuntu installation from the .iso, virgin UrBackup database with the exception of the security keys from the old UrBackup. The old server is still running and doing backups, so I’ve been able to refer back to its config as needed. Both servers use ext4 for the system and btrfs for data. Mix of linux and Windows clients with both image and file backups.
Seems to work well in a vm. So I loaded UrBackup up with a handful of clients to stress test it a bit before going all in…maybe around 7-10 clients, combination of image and file. Many completed fine; some were longer and/or started later and so were in process when urbackupsrv crashed.
Fast forward through multiple attempts to isolate/diagnose the problem:
- It takes a few hours to crash, surviving longer when idling. So it’s probably related to activity. Latest crash was only doing image backups. Others have been mixed and maybe file-only.
- Restoring from the automated backup of an uncorrupt urbackup database works (until the next crash).
- ‘tail -f /var/log/urbackup’: No messages related to the crash.
- ‘journalctl -f’ during crash:
Jun 18 12:11:16 l9n-backup3 kernel: backup archival[2930]: segfault at 3f3d6469746e ip 00007ff6f2f53881 sp 00007ff6e4ff87c8 error 4 in libc.so.6[7ff6f2eb7000+195000]
Jun 18 12:11:16 l9n-backup3 kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 93 06 07 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 10 0f 82 2f 01 00 00 <0f> 10 06 48 83 fa 20 0f 87 8f 01 00 00 0f 10 4c 16 f0 0f 11 07 0f
- Virtualizing the UrBackup server isn’t the problem. After several days of fighting the new server, suddenly urbackupsrv on my old stable 20.04 LTS server crashed with nearly identical symptoms!
- It’s NOT having two urbackup servers running simultaneously. I know it’s possible to run two urbackup servers in the same network at the same time (by copying the security keys…what I did), but I turned my old server off when urbackup segfaulted while I continued to troubleshoot on the new one. No luck. Still crashed.
- I’ve tried deleting the urbackup database AND reformatting the data partition, but still experience the crash.
- ‘urbackupsrv --repair-database’ no longer seems to work, even on a virgin Ubuntu 20.04/22.04 + virgin (uncorrupted) ubuntu database.
- I’ve tried ‘urbackupsrv --remove-unknown’ and ‘urbackupsrv --cleanup’ in a couple of test iterations. They completed, but didn’t help once the segfault occurred.
I’m running short of ways to try to troubleshoot this problem and this has elevated to approaching emergency status now that my old/stable Ubuntu 20.04 + UrBackup server seems to have been affected too. Anyone have any suggestions? Or… the solution!?
TIA
edit
Tried to start urbackupsrv post-crash with with LOGLEVEL=“debug” in ‘/etc/default/urbacksrv’.
‘/var/log/urbackupsrv’:
2024-06-18 13:50:06: Starting HTTP-Server on port 55414
2024-06-18 13:50:06: ERROR: HTTP: Creating v6 SOCKET failed
2024-06-18 13:50:06: HTTP: Server started up successfully!
2024-06-18 13:50:06: SQLite: recovered 3655 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 51402 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 8 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 1706 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 3655 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 1706 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 51402 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 8 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-18 13:50:07: Started UrBackup...
2024-06-18 13:50:07: Removing temporary files...
2024-06-18 13:50:07: Recreating temporary folder...
2024-06-18 13:50:07: Testing if backup destination can handle subvolumes and snapshots...
2024-06-18 13:50:07: Backup destination does handle subvolumes and snapshots. Snapshots enabled for image and file backups.
2024-06-18 13:50:07: Testing if backup destination can handle filesystem transactions...
2024-06-18 13:50:07: Testing for hardlinks in backup destination...
2024-06-18 13:50:07: Could create hardlink at backup destination. Hardlinks enabled.
2024-06-18 13:50:07: Testing for reflinks in backup destination...
2024-06-18 13:50:07: Could create reflink at backup destination. Reflinks enabled.
2024-06-18 13:50:07: Binding to interface enp1s0 (ipv4) for broadcasting...
2024-06-18 13:50:07: Broadcasting on ipv4 interface enp1s0 addr 10.1.1.15
2024-06-18 13:50:07: ERROR: InternetService: Creating v6 SOCKET failed
2024-06-18 13:50:07: InternetService: Server started up successfully!
2024-06-18 13:50:07: UrBackup Server start up complete.
2024-06-18 13:50:07: Looking for old Sessions... 0 sessions
2024-06-18 13:50:07: ERROR: Creating ipv6 SOCKET failed. Port 55413 may already be in use
2024-06-18 13:50:07: Server started up successfully!
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=