Hello,
This is an issue that I’m having already a long time but it never presented itself as often as it does now.
So the issue:
At random times, this can be multiple time a day but could also be days apart, my server is completely freezing. SSH and ping aren’t responding anymore.
I have to hard reset the server through the datacenter’s interface, and after the reboot everything is “fine” again.
The server has been hardware tested by the datacenter (took a whole day with lot’s of load testing) and didn’t give any failure.
I have disabled the urbackup service for two weeks, and the server has been running fine during that time (still doing other ftp back-up stuff). So to me it does seem an issue triggered by UrBackup.
Linux host06 5.10.0-32-amd64 #1 SMP Debian 5.10.223-1 (2024-08-10) x86_64 GNU/Linux
UrBackup Server v2.5.33.0
4 x 6TB HDD with Raid 5 (mdraid), LVM on top of that, and BTRFS
What have I tried:
- OS Updates (thinking it was btrfs stability issues)
- UrBackup Updates
- Allowed FTP only from certain hosts (thinking it might be some sort of attack since I always saw failed logins right before the server crashes).
- Moved the database to a separate USB stick (not the most performant, but would exclude concurrency issues on the drives)
- Looked at the console when the server is frozen, but there is nothing of importance there.
The last lines of this morning’s crash are these (logging is in debug mode):
2024-09-22 09:39:04: Established internet connection. Service=0
2024-09-22 09:39:04: Referencing snapshot on "Client x" for path "backup_8" failed: FAILED
2024-09-22 09:39:04: Authed+capa for client 'Client x' (encrypted-v2, compressed-zstd, token auth) - 1 spare connections
2024-09-22 09:39:13: Authed+capa for client 'Client y' (encrypted-v2, compressed-zstd, token auth) - 1 spare connections
2024-09-22 09:39:22: LockForTransaction in CQuery::Execute Stmt: [INSERT INTO files (backupid, fullpath, hashpath, shahash, filesize, rsize, clientid, incremental, next_entry, prev_entry, pointed_to) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
And the last syslog lines
Sep 22 09:17:01 host06 CRON[227900]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 22 09:25:01 host06 CRON[228102]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 22 09:35:01 host06 CRON[228391]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Does anybody know what I can try or change to get my server running stable or to get more information about the root cause?
regards,
Stijn