Constant urbackupsrv.service crash + corrupt database

I’m running out of things to try here and getting desperate.

I’ve been running an Ubuntu Server 20.04 LTS + UrBackup 2.5.33 on a bare metal for a very long time and it has been rock solid. Recently, due to some hard drive failures in my RAID (no actual system failure), I figured it’s time to upgrade. I heavily use virtual machines, so figured I’d leverage my VM infrastructure. So I created a VM, passed through a pair of 24TB hdds dedicated to the VM, installed Ubuntu 22.04 LTS + UrBackup 2.5.33. Virgin Ubuntu installation from the .iso, virgin UrBackup database with the exception of the security keys from the old UrBackup. The old server is still running and doing backups, so I’ve been able to refer back to its config as needed. Both servers use ext4 for the system and btrfs for data. Mix of linux and Windows clients with both image and file backups.

Seems to work well in a vm. So I loaded UrBackup up with a handful of clients to stress test it a bit before going all in…maybe around 7-10 clients, combination of image and file. Many completed fine; some were longer and/or started later and so were in process when urbackupsrv crashed.

Fast forward through multiple attempts to isolate/diagnose the problem:

  • It takes a few hours to crash, surviving longer when idling. So it’s probably related to activity. Latest crash was only doing image backups. Others have been mixed and maybe file-only.
  • Restoring from the automated backup of an uncorrupt urbackup database works (until the next crash).
  • ‘tail -f /var/log/urbackup’: No messages related to the crash.
  • ‘journalctl -f’ during crash:
Jun 18 12:11:16 l9n-backup3 kernel: backup archival[2930]: segfault at 3f3d6469746e ip 00007ff6f2f53881 sp 00007ff6e4ff87c8 error 4 in libc.so.6[7ff6f2eb7000+195000]
Jun 18 12:11:16 l9n-backup3 kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 93 06 07 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 10 0f 82 2f 01 00 00 <0f> 10 06 48 83 fa 20 0f 87 8f 01 00 00 0f 10 4c 16 f0 0f 11 07 0f
  • Virtualizing the UrBackup server isn’t the problem. After several days of fighting the new server, suddenly urbackupsrv on my old stable 20.04 LTS server crashed with nearly identical symptoms!
  • It’s NOT having two urbackup servers running simultaneously. I know it’s possible to run two urbackup servers in the same network at the same time (by copying the security keys…what I did), but I turned my old server off when urbackup segfaulted while I continued to troubleshoot on the new one. No luck. Still crashed.
  • I’ve tried deleting the urbackup database AND reformatting the data partition, but still experience the crash.
  • ‘urbackupsrv --repair-database’ no longer seems to work, even on a virgin Ubuntu 20.04/22.04 + virgin (uncorrupted) ubuntu database.
  • I’ve tried ‘urbackupsrv --remove-unknown’ and ‘urbackupsrv --cleanup’ in a couple of test iterations. They completed, but didn’t help once the segfault occurred.

I’m running short of ways to try to troubleshoot this problem and this has elevated to approaching emergency status now that my old/stable Ubuntu 20.04 + UrBackup server seems to have been affected too. Anyone have any suggestions? Or… the solution!?

TIA

edit
Tried to start urbackupsrv post-crash with with LOGLEVEL=“debug” in ‘/etc/default/urbacksrv’.
‘/var/log/urbackupsrv’:

2024-06-18 13:50:06: Starting HTTP-Server on port 55414
2024-06-18 13:50:06: ERROR: HTTP: Creating v6 SOCKET failed
2024-06-18 13:50:06: HTTP: Server started up successfully!
2024-06-18 13:50:06: SQLite: recovered 3655 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 51402 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 8 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 1706 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 3655 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 1706 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 51402 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2024-06-18 13:50:07: SQLite: recovered 8 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-18 13:50:07: Started UrBackup...
2024-06-18 13:50:07: Removing temporary files...
2024-06-18 13:50:07: Recreating temporary folder...
2024-06-18 13:50:07: Testing if backup destination can handle subvolumes and snapshots...
2024-06-18 13:50:07: Backup destination does handle subvolumes and snapshots. Snapshots enabled for image and file backups.
2024-06-18 13:50:07: Testing if backup destination can handle filesystem transactions...
2024-06-18 13:50:07: Testing for hardlinks in backup destination...
2024-06-18 13:50:07: Could create hardlink at backup destination. Hardlinks enabled.
2024-06-18 13:50:07: Testing for reflinks in backup destination...
2024-06-18 13:50:07: Could create reflink at backup destination. Reflinks enabled.
2024-06-18 13:50:07: Binding to interface enp1s0 (ipv4) for broadcasting...
2024-06-18 13:50:07: Broadcasting on ipv4 interface enp1s0 addr 10.1.1.15
2024-06-18 13:50:07: ERROR: InternetService: Creating v6 SOCKET failed
2024-06-18 13:50:07: InternetService: Server started up successfully!
2024-06-18 13:50:07: UrBackup Server start up complete.
2024-06-18 13:50:07: Looking for old Sessions... 0 sessions
2024-06-18 13:50:07: ERROR: Creating ipv6 SOCKET failed. Port 55413 may already be in use
2024-06-18 13:50:07: Server started up successfully!
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-18 13:50:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=

Sry it is crashing for you.

Could you nevertheless post/send the last messages before the carsh. There might be messages which hint at the cause.

The best information to get this fixed can be retrieved by following https://urbackup.atlassian.net/wiki/spaces/US/pages/8323075/Debugging+with+gdb+on+Linux

The dmesg tells it crashes in archival so e.g. removing all archival rules might be a work-around.

Why do you say the database is corrupt?

Thanks for the responses, @uroni. I’ve realized that I hadn’t had LOGLEVEL=“debug” while the segfault actually occurs yet. Only afterwards. So I’ve restored my last-good copy of the database and am running it now with debug. So perhaps I’ll find something more interesting in ‘/var/log/urbackup.log’. I’ll post the result after it crashes again.

I’m assuming the database is corrupt because everything will be fine up until urbackupsrv segfaults. I can start/stop urbackupsrv with no issues. Immediately after the segfault, however, I can’t start urbackupsrv again without another immediate segfault. However, if I switch out the database with a pre-segfault copy, it’ll start right up without issue. Shut it down again and try to use the segfaulted database and it fails again. So by process of elimination, with nothing else changing…not even a reboot…the problem must somehow exist in the database. …presumably something in the database files changes, causing the segfault to occur. And once that happens, I can’t seem to recover from it without reverting to a pre-segfaulted database. I think my logic is sound. Hopefully I’ve explained myself well enough!

I see no indication that this is a database corruption in what you have posted.

Fair enough. You would absolutely know better than I would!

From my understanding, ‘/var/urbackup’ contains the *.db, *.db-shm, *.db-wal, and *.ub files, which I associate with “the urbackup database.” Once the segfault occurs, I can no longer start urbackupsrv using the post-segfault ‘/var/urbackup’ directory. If I swap ‘/var/urbackup’ with a pre-segfault copy, UrBackup reverts back to that previous state and runs fine—until the next segfault, which again renders ‘/var/urbackup’ unusable. Swapping in/out a “bad” post-segfault ‘/var/urbackup’ directly affects whether or not I can start urbackupsrv.

You’re probably right that there isn’t real database corruption. However, something changes in those files that seems to trigger the segfault from that point on. I’m stuck as to what it is and what triggers it.

If/when it segfaults again, I’ll review and post the tail end of the debug-enabled ‘/var/log/urbackup.log’ and hopefully find a clue there.

On a side note, I want to take this opportunity to say:
On behalf of 12 small and underfunded public libraries in Ohio, USA…we collectively thank you for all of the time and effort you put into making UrBackup such a great project! You (and the UrBackup community) are hugely appreciated!

Took a little while this time, but it crashed again.

‘journalctl -xe’:

Jun 19 12:00:01 l9n-backup3 CRON[7959]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 19 12:00:01 l9n-backup3 CRON[7960]: (root) CMD (timeshift --check --scripted)
Jun 19 12:00:01 l9n-backup3 crontab[7994]: (root) LIST (root)
Jun 19 12:00:01 l9n-backup3 crontab[7995]: (root) LIST (root)
Jun 19 12:00:01 l9n-backup3 CRON[7959]: pam_unix(cron:session): session closed for user root
Jun 19 12:03:02 l9n-backup3 kernel: backup archival[4228]: segfault at 3f3d6469746e ip 00007f462e0c7881 sp 00007f4613ffe7c8 error 4 in libc.so.6[7f462e02b000+195000]
Jun 19 12:03:02 l9n-backup3 kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 93 06 07 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 10 0f 82 2f 01 00 00 <0f> 10 06 48 83 fa 20 0f 87 8f 01 00 00 0f 10 4c 16 f0 0f 11 07 0f
Jun 19 12:08:10 l9n-backup3 sudo[8027]: pam_unix(sudo:auth): authentication failure; logname=locadm uid=1000 euid=0 tty=/dev/pts/6 ruser=locadm rhost=  user=locadm
Jun 19 12:08:17 l9n-backup3 sudo[8027]:   locadm : TTY=pts/6 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/less /var/log/urbackup.log
Jun 19 12:08:17 l9n-backup3 sudo[8027]: pam_unix(sudo:session): session opened for user root(uid=0) by locadm(uid=1000)

'/var/log/urbackup.log’:

2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=4161628 hash=Z6g22X5gsEB2moJCByesTA== target=3001085
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=9682944 hash=amdBFLG2ro02/eRnxg/KbA== target=3001065
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=3458394 hash=bkJu/TTt52uAlNvHcNf5PQ== target=3001059
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=5996795 hash=edRkAP7echenzE3FHmoy9g== target=3001062
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=1884288 hash=g0/XRvFyvGTeeVOQfNWuUQ== target=3001063
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=9889792 hash=laeMG4k3G4OgedYLRmK7vw== target=3001066
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=3562112 hash=mdtjLZdaaKozXY3ZWwZMjA== target=3001074
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=3588177 hash=o6/GVLRDDaylyyOCXAsVrQ== target=3001058
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=4522333 hash=qA3a3P1FqE4KYEPeFFuhqg== target=3001073
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=2281779 hash=qluovOtjrJ+0Dm5pDRBtPw== target=3001076
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=4403045 hash=s53WTV8oEzMYN5YTDnQw8Q== target=3001077
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=4560288 hash=uZ72+QzAKwxwImBPLW+JpQ== target=3001072
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=1847750 hash=vHJ6YDXyFwfYHFbVgteLYw== target=3001083
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=725242 hash=w8WAggWS2HgItal1gBJtsQ== target=3001069
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=2056857 hash=6plee+GkksxSSuBRXlt0Tg== target=3001064
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=4722688 hash=8V+j57JMXmXFieEruhRwOA== target=3001078
2024-06-19 12:02:56: LMDB: PUT clientid=19 filesize=2270422 hash=8eLFf12yaw1RRaIHzlvSNg== target=3001082
2024-06-19 12:02:57: GT: Loaded file "The Addams family.jpg"
2024-06-19 12:02:57: Loading file "The Andy Griffith show 50th anniversary the best of mayberry (back).jpg"
2024-06-19 12:02:57: PT: Hashing file "The Addams family.jpg"
2024-06-19 12:02:57: HT: Copying file: "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Addams family.jpg" (id=53395)
2024-06-19 12:02:57: HT: Renaming file to "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Addams family.jpg" with hash output
2024-06-19 12:02:57: New fileindex entry for "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Addams family.jpg" id=3001086 hash=mtN9BFN6E1WjgMXchz9n1g==
2024-06-19 12:02:58: GT: Loaded file "The Andy Griffith show 50th anniversary the best of mayberry (back).jpg"
2024-06-19 12:02:58: Loading file "The Andy Griffith show 50th anniversary the best of mayberry (front).jpg"
2024-06-19 12:02:58: PT: Hashing file "The Andy Griffith show 50th anniversary the best of mayberry (back).jpg"
2024-06-19 12:02:58: HT: Copying file: "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Andy Griffith show 50th anniversary the best of mayberry (back).jpg" (id=53396)
2024-06-19 12:02:58: HT: Renaming file to "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Andy Griffith show 50th anniversary the best of mayberry (back).jpg" with hash output
2024-06-19 12:02:58: New fileindex entry for "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Andy Griffith show 50th anniversary the best of mayberry (back).jpg" id=3001087 hash=kGiICxVfnNCAutIdSx1RJw==
2024-06-19 12:02:59: GT: Loaded file "The Andy Griffith show 50th anniversary the best of mayberry (front).jpg"
2024-06-19 12:02:59: Loading file "The Angry Birds 2.jpg"
2024-06-19 12:02:59: PT: Hashing file "The Andy Griffith show 50th anniversary the best of mayberry (front).jpg"
2024-06-19 12:02:59: HT: Copying file: "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Andy Griffith show 50th anniversary the best of mayberry (front).jpg" (id=53397)
2024-06-19 12:02:59: HT: Renaming file to "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Andy Griffith show 50th anniversary the best of mayberry (front).jpg" with hash output
2024-06-19 12:02:59: New fileindex entry for "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Andy Griffith show 50th anniversary the best of mayberry (front).jpg" id=3001088 hash=Sa4BoNxQ+outw+o6JLFNyw==
2024-06-19 12:03:00: GT: Loaded file "The Angry Birds 2.jpg"
2024-06-19 12:03:00: Loading file "The Bad Guys.jpg"
2024-06-19 12:03:00: PT: Hashing file "The Angry Birds 2.jpg"
2024-06-19 12:03:00: HT: Copying file: "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Angry Birds 2.jpg" (id=53398)
2024-06-19 12:03:00: HT: Renaming file to "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Angry Birds 2.jpg" with hash output
2024-06-19 12:03:00: New fileindex entry for "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Angry Birds 2.jpg" id=3001089 hash=W+utJsM0tTL28t7iumEB4A==
2024-06-19 12:03:01: GT: Loaded file "The Bad Guys.jpg"
2024-06-19 12:03:01: Loading file "The Banshees of Inisherin.jpg"
2024-06-19 12:03:01: PT: Hashing file "The Bad Guys.jpg"
2024-06-19 12:03:01: HT: Copying file: "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Bad Guys.jpg" (id=53399)
2024-06-19 12:03:01: HT: Renaming file to "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Bad Guys.jpg" with hash output
2024-06-19 12:03:01: New fileindex entry for "/mnt/data0/urbackup/hbml-win3[Data]/240618-2103/Drive D/Common/Orders/Processing/DVD Covers/The Bad Guys.jpg" id=3001090 hash=hriAxMr+Z5HeCOv2sCI+Vg==
2024-06-19 12:03:02: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:03:02: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:03:02: Did not find backup suitable for archiving with backup_type=3 image=false letter=

At first I wondered if the “Did not find backup suitable for archiving…” messages were relevant. But when I ‘grep “Did not find backup suitable for archiving”’, I see lots of them with backup_type=3 and backup_type=12 in my log. So I’m assuming these are probably normal messages. Essentially…everything seems normal except for whatever is changing in ‘/var/urbackup’ to cause the segfault from that point on.

When I try to restart urbackupsrv –
‘journalctl -xe’:

Jun 19 12:24:00 l9n-backup3 sudo[8341]:   locadm : TTY=pts/6 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl start urbackupsrv
Jun 19 12:24:00 l9n-backup3 sudo[8341]: pam_unix(sudo:session): session opened for user root(uid=0) by locadm(uid=1000)
Jun 19 12:24:00 l9n-backup3 systemd[1]: Starting LSB: Server for doing backups...
Jun 19 12:24:01 l9n-backup3 systemd[1]: Started LSB: Server for doing backups.
Jun 19 12:24:01 l9n-backup3 sudo[8341]: pam_unix(sudo:session): session closed for user root
Jun 19 12:24:05 l9n-backup3 kernel: backup archival[8378]: segfault at 3f3d6469746e ip 00007f9835d98881 sp 00007f9813ffe7c8 error 4 in libc.so.6[7f9835cfc000+195000]
Jun 19 12:24:05 l9n-backup3 kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 93 06 07 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 10 0f 82 2f 01 00 00 <0f> 10 06 48 83 fa 20 0f 87 8f 01 00 00 0f 10 4c 16 f0 0f 11 07 0f
Jun 19 12:24:19 l9n-backup3 sudo[8388]:   locadm : TTY=pts/6 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl status urbackupsrv
Jun 19 12:24:19 l9n-backup3 sudo[8388]: pam_unix(sudo:session): session opened for user root(uid=0) by locadm(uid=1000)
Jun 19 12:24:19 l9n-backup3 sudo[8388]: pam_unix(sudo:session): session closed for user root
Jun 19 12:24:36 l9n-backup3 sudo[8410]:   locadm : TTY=pts/6 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl status urbackupsrv
Jun 19 12:24:36 l9n-backup3 sudo[8410]: pam_unix(sudo:session): session opened for user root(uid=0) by locadm(uid=1000)
Jun 19 12:24:36 l9n-backup3 sudo[8410]: pam_unix(sudo:session): session closed for user root
Jun 19 12:24:43 l9n-backup3 sudo[8420]:   locadm : TTY=pts/6 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop urbackupsrv
Jun 19 12:24:43 l9n-backup3 sudo[8420]: pam_unix(sudo:session): session opened for user root(uid=0) by locadm(uid=1000)
Jun 19 12:24:43 l9n-backup3 systemd[1]: Stopping LSB: Server for doing backups...
Jun 19 12:24:43 l9n-backup3 systemd[1]: urbackupsrv.service: Deactivated successfully.
Jun 19 12:24:43 l9n-backup3 systemd[1]: Stopped LSB: Server for doing backups.
Jun 19 12:24:43 l9n-backup3 sudo[8420]: pam_unix(sudo:session): session closed for user root
Jun 19 12:27:56 l9n-backup3 sudo[8457]:   locadm : TTY=pts/6 ; PWD=/ ; USER=root ; COMMAND=/usr/bin/less /var/log/urbackup.log
Jun 19 12:27:56 l9n-backup3 sudo[8457]: pam_unix(sudo:session): session opened for user root(uid=0) by locadm(uid=1000)

and ‘/var/log/urbackup.log’:

024-06-19 12:24:01: Starting HTTP-Server on port 55414
2024-06-19 12:24:01: ERROR: HTTP: Creating v6 SOCKET failed
2024-06-19 12:24:01: HTTP: Server started up successfully!
2024-06-19 12:24:01: SQLite: recovered 942 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2024-06-19 12:24:04: SQLite: recovered 91091 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2024-06-19 12:24:04: SQLite: recovered 7 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-19 12:24:04: SQLite: recovered 274 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2024-06-19 12:24:04: SQLite: recovered 942 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2024-06-19 12:24:04: SQLite: recovered 274 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2024-06-19 12:24:05: SQLite: recovered 91091 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2024-06-19 12:24:05: SQLite: recovered 7 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-19 12:24:05: Started UrBackup...
2024-06-19 12:24:05: Removing temporary files...
2024-06-19 12:24:05: Recreating temporary folder...
2024-06-19 12:24:05: Testing if backup destination can handle subvolumes and snapshots...
2024-06-19 12:24:05: ERROR: InternetService: Creating v6 SOCKET failed
2024-06-19 12:24:05: InternetService: Server started up successfully!
2024-06-19 12:24:05: UrBackup Server start up complete.
2024-06-19 12:24:05: Looking for old Sessions... 0 sessions
2024-06-19 12:24:05: ERROR: Creating ipv6 SOCKET failed. Port 55413 may already be in use
2024-06-19 12:24:05: Server started up successfully!
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 12:24:05: Did not find backup suitable for archiving with backup_type=3 image=false letter=

I have reverted to an automatic backup copy of the database from about two hours before the segfault. I will try removing all archival rules as you suggested as a possible workaround. These are the same rules I used on my old server, and I have recreated them several times when rebuilding the new server config. However, they are not very important to me, so I can easily delete them permanently if this proves to be a working solution.

If possible please follow the debugging steps I linked to. That way this bug can be fixed.

First time compiling a program on linux and definitely first time using GDB, so hopefully I got it right and this is what you’re looking for. This is an immediate segfault from a copy of the database that has already experienced the problem. Now I’ll go back to a working copy and run until it segfaults again.

Starting program: /usr/local/bin/urbackupsrv run -v debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2024-06-19 19:22:06: Starting HTTP-Server on port 55414
2024-06-19 19:22:06: ERROR: HTTP: Creating v6 SOCKET failed
2024-06-19 19:22:06: HTTP: Server started up successfully!
[New Thread 0x7ffff6681640 (LWP 22002)]
[New Thread 0x7ffff5e80640 (LWP 22003)]
2024-06-19 19:22:06: SQLite: recovered 944 frames from WAL file /usr/local/var/urbackup/backup_server.db-wal code: 283
2024-06-19 19:22:06: SQLite: recovered 91091 frames from WAL file /usr/local/var/urbackup/backup_server_files.db-wal code: 283
2024-06-19 19:22:06: SQLite: recovered 8 frames from WAL file /usr/local/var/urbackup/backup_server_link_journal.db-wal code: 283
2024-06-19 19:22:06: SQLite: recovered 280 frames from WAL file /usr/local/var/urbackup/backup_server_settings.db-wal code: 283
[New Thread 0x7ffff567f640 (LWP 22004)]
2024-06-19 19:22:06: SQLite: recovered 944 frames from WAL file /usr/local/var/urbackup/backup_server.db-wal code: 283
2024-06-19 19:22:06: SQLite: recovered 280 frames from WAL file /usr/local/var/urbackup/backup_server_settings.db-wal code: 283
2024-06-19 19:22:06: SQLite: recovered 91091 frames from WAL file /usr/local/var/urbackup/backup_server_files.db-wal code: 283
2024-06-19 19:22:06: SQLite: recovered 8 frames from WAL file /usr/local/var/urbackup/backup_server_link_journal.db-wal code: 283
[New Thread 0x7ffff4a6c640 (LWP 22005)]
2024-06-19 19:22:06: Started UrBackup...
[New Thread 0x7fffe7fff640 (LWP 22006)]
2024-06-19 19:22:06: Removing temporary files...
2024-06-19 19:22:06: Recreating temporary folder...
[Detaching after vfork from child process 22007]
TEST FAILED: Please compile with mountvhd (./configure --with-mountvhd)
[Detaching after vfork from child process 22009]
2024-06-19 19:22:06: Image mounting disabled: TEST FAILED: Please compile with mountvhd (./configure --with-mountvhd)
2024-06-19 19:22:06: Testing if backup destination can handle subvolumes and snapshots...
[Detaching after vfork from child process 22011]
Testing for btrfs...
Create subvolume '/mnt/data0/urbackup/testA54hj5luZtlorr494/A'
Create a snapshot of '/mnt/data0/urbackup/testA54hj5luZtlorr494/A' in '/mnt/data0/urbackup/testA54hj5luZtlorr494/B'
Delete subvolume (commit): '/mnt/data0/urbackup/testA54hj5luZtlorr494/A'
Delete subvolume (commit): '/mnt/data0/urbackup/testA54hj5luZtlorr494/B'
BTRFS TEST OK
2024-06-19 19:22:06: Backup destination does handle subvolumes and snapshots. Snapshots enabled for image and file backups.
2024-06-19 19:22:06: Testing if backup destination can handle filesystem transactions...
2024-06-19 19:22:06: Testing for hardlinks in backup destination...
2024-06-19 19:22:06: Could create hardlink at backup destination. Hardlinks enabled.
2024-06-19 19:22:06: Testing for reflinks in backup destination...
2024-06-19 19:22:06: Could create reflink at backup destination. Reflinks enabled.
2024-06-19 19:22:06: Binding to interface enp1s0 (ipv4) for broadcasting...
2024-06-19 19:22:06: Broadcasting on ipv4 interface enp1s0 addr 10.1.1.15
2024-06-19 19:22:07: ERROR: InternetService: Creating v6 SOCKET failed
2024-06-19 19:22:07: InternetService: Server started up successfully!
[New Thread 0x7fffe77fe640 (LWP 22028)]
[New Thread 0x7fffe6ffd640 (LWP 22029)]
[New Thread 0x7fffe67fc640 (LWP 22030)]
[New Thread 0x7fffe5ffb640 (LWP 22031)]
[New Thread 0x7fffe57fa640 (LWP 22032)]
[New Thread 0x7fffe4ff9640 (LWP 22033)]
[New Thread 0x7fffcbfff640 (LWP 22034)]
[New Thread 0x7fffcb7fe640 (LWP 22035)]
[New Thread 0x7fffcaffd640 (LWP 22036)]
[New Thread 0x7fffca7fc640 (LWP 22037)]
[New Thread 0x7fffc9ffb640 (LWP 22038)]
2024-06-19 19:22:07: UrBackup Server start up complete.
[New Thread 0x7fffc97fa640 (LWP 22039)]
2024-06-19 19:22:07: Looking for old Sessions... 0 sessions
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: ERROR: Creating ipv6 SOCKET failed. Port 55413 may already be in use
2024-06-19 19:22:07: Server started up successfully!
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=12 image=true letter=*
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=
2024-06-19 19:22:07: Did not find backup suitable for archiving with backup_type=3 image=false letter=

Thread 11 "backup archival" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe57fa640 (LWP 22032)]
memcpy () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:222
222     ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  memcpy () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:222
#1  0x000055555591a0d6 in std::char_traits<char>::copy (__n=325, __s2=0x90 <error: Cannot access memory at address 0x90>, __s1=<optimized out>) at /usr/include/c++/11/bits/char_traits.h:437
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__n=325, __s=0x90 <error: Cannot access memory at address 0x90>, __d=<optimized out>)
    at /usr/include/c++/11/bits/basic_string.h:359
#3  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__n=325, __s=0x90 <error: Cannot access memory at address 0x90>, __d=<optimized out>)
    at /usr/include/c++/11/bits/basic_string.h:354
#4  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy_chars (__k2=<optimized out>, __k1=0x90 <error: Cannot access memory at address 0x90>, 
    __p=<optimized out>) at /usr/include/c++/11/bits/basic_string.h:401
#5  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*> (this=this@entry=0x7fffe57f9ab0, __beg=0x90 <error: Cannot access memory at address 0x90>, 
    __end=<optimized out>) at /usr/include/c++/11/bits/basic_string.tcc:225
#6  0x00005555559208f9 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char*> (__end=<optimized out>, __beg=<optimized out>, 
    this=0x7fffe57f9ab0) at /usr/include/c++/11/bits/basic_string.h:255
#7  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*> (__end=<optimized out>, __beg=<optimized out>, this=0x7fffe57f9ab0)
    at /usr/include/c++/11/bits/basic_string.h:274
#8  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (__str=..., this=0x7fffe57f9ab0) at /usr/include/c++/11/bits/basic_string.h:459
#9  ServerAutomaticArchive::archiveBackups (this=<optimized out>) at urbackupserver/server_archive.cpp:168
#10 0x0000555555920f86 in ServerAutomaticArchive::operator() (this=0x555555c90410) at urbackupserver/server_archive.cpp:45
#11 0x0000555555605213 in thread_helper_f (t=0x555555c90410) at Server.cpp:1495
#12 0x00007ffff753aac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#13 0x00007ffff75cc850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Looks like corrupt memory pointers. Have you memtested your system?

If both Urbackup servers are running off the same bare metal server - that would explain failure of both, since Urbackup, specially with large amount of backups puts stress on RAM.

It’s a good thought. But no. Two different physical servers. Both heavily used and very well-established. I’m 99.99% certain it’s not a RAM issue causing consistently identical failures on the two different servers.

If there is a way to test RAM, I would still suggest to do it. In your stack trace there are plenty
“Cannot access memory at address …”

Which is either RAM corruption, or kernel bug or SELinux is misbehaving.

Urbackup urbackupserver/server_archive.cpp:168 is simple string manipulation, which ends up in standard c++ libraries, which are very unlikely to cause such an issue.

UPD: One more thought, after looking at urbackup code: what are the names of the clients you’re working with? And their volume names? Are they all in latin characters?

This should fix it:

If possible please apply the patch (easiest way would be to just do the change manually) and try with a new compile if it still crashes.

If I read this right this should only occur if letters to archive is empty, which doesn’t make much sense, but of course it should still not crash.

1 Like

Workstations are a combination of Windows and linux clients. We use a pretty straight-forward naming convention that’s strictly lowercase a-z, 1-9, dash (-), and period (.). None of the names have changed in the past few months. Most of my config has been the same for a very long time. Name examples:

Many of the clients have virtual sub client names, pretty exclusively: “System|Data” or just “System”.

“Default directories to backup:” On Windows clients:
“C:;D:|Drive D/optional,follow_symlinks,symlinks_optional,share_hashes;E:|Drive E/optional,follow_symlinks,symlinks_optional,share_hashes”

“Default directories to backup:” On Linux clients:
“/|root/follow_symlinks,symlinks_optional,share_hashes;/mnt/data|data/optional,follow_symlinks,symlinks_optional,share_hashes;/mnt/data0|data0/optional,follow_symlinks,symlinks_optional,share_hashes;/mnt/data1|data1/optional,follow_symlinks,symlinks_optional,share_hashes”

For this to be happening on both my new UrBackup server and the long-time stable one, I seems like it should likely be:

  • an OS update that’s similar on both Ubuntu 22.04 and 20.04 (in which case others would likely have seen this by now too.)
  • a funky config that I’ve introduced on the old server within a day or two of it (also) failing and that I’ve duplicated onto the new server (manually by simply referring back to the old server’s UrBackup GUI as I configured the new one very similarly.)
  • Something coming from the clients, which are a common denominator. (But this seems unlikely.)

I’m wondering if perhaps @uroni isn’t onto something with regard to the archive rules. I had them set up in my original server on some of the groups, but I don’t think I was really using them. When I set up the new one, however, I cleaned up my groups and configuration a bit to let the archive rules flow down into the groups from a higher level. So I definitely have them configured and implemented on my new server. I don’t think they’re anything crazy, though:

  • Archive every 1 day for 30 days, Archive window=“;;;”, File backup,Volume letters = “-”
  • 30 days for 1 year, Archive window=“;;;”, File backup,Volume letters = “-”
  • 1 year for 4 years, Archive window=“;;;”, File backup,Volume letters = “-”
  • 30 days for 1 year, Archive window=“;;;”, Image backup,Volume letters = “ALL”
  • 1 year for 4 years, Archive window=“;;;”, Image backup,Volume letters = “ALL”

I’ve had these rules enabled most of the time on the new server. I had started running a test iteration with them all disabled, but switched gears to running in debug mode. I’m been running on a working copy of ‘/var/urbackup’ since yesterday waiting for it to fail so I can get a stack trace at the actual time of failure. Once that happens, I could nuke the rules and see what happens.

Thanks for looking into this! I’m not a programmer, so this may stretch my skill level a bit, but I’ll see if I can figure it out while I’m waiting for the current test iteration to segfault.

Good catch. Perhaps I should be more attentive next time.

@Chad_Neeper It seems that @uroni got it right and patch should fix the issue.

I recompiled the fixed code, then stopped my current test and went back to a “non-working” copy of the database. It did indeed start up without any problem this time. In thousands of lines of code, swapping out three little characters for seven different characters for the win!

Additional testing/time will tell for sure, but I’ll go out on a limb here and say that you’ve resolved it. I’m grateful, thank you!