[Docker][Server 2.5.33][Ubuntu 24.04] Memory peak usage on cleanup window

Since a week, I encounter nightly OOM-Killer events. It took me a while to check if Urbackup was the reason, because the first ideas was the ZFS ARC. After limiting ARC to 10GB (of 32GB RAM total), the oom-killer events did not stopped.

The time is exacty within the cleanup window configured in urbackup server (1 am - 2am). I captured one last log line from 2 days ago, which says “Copying urback_server_files.db” or similar - the file itself is 4GB big in size.

I never had this issues before and Iam not sure if urbackup is the right candidate, but durung oom-killer sessions, urbackup just started its daily cleanup task.

Something I can check? I try to capture whole server logs next time.

EDIT: Yea, Portainer wasntr giving me the needed logs, `docker logs` did…

Everytime this happens, Urbackup is at the same step:

2025-11-02 01:17:44: Checking integrity of backup_server.db
2025-11-02 01:17:44: Checking integrity of backup_server_settings.db
2025-11-02 01:17:44: Checking integrity of backup_server_files.db
2025-11-02 01:18:01: Checking integrity of backup_server_links.db
2025-11-02 01:18:02: Checking integrity of backup_server_link_journal.db
2025-11-02 01:18:02: Starting database backup of backup_server.db...
2025-11-02 01:18:02: Stop checkpointing of backup_server.db...
2025-11-02 01:18:03: Stop writes to backup_server.db...
2025-11-02 01:18:03: Copying backup_server.db to /backups/urbackup...
2025-11-02 01:18:32: SQLITE_BUSY in CQuery::Execute  Stmt: [UPDATE clients SET lastseen=datetime(?, 'unixepoch') WHERE id=?]
2025-11-02 01:18:32: Active query(0): UPDATE clients SET lastseen=datetime(?, 'unixepoch') WHERE id=?
2025-11-02 01:18:32: Active query(1): UPDATE clients SET  file_ok=?, image_ok=?, alerts_next_check=?, alerts_state=? WHERE id=?
2025-11-02 01:18:42: SQLITE_BUSY in CQuery::Execute  Stmt: [UPDATE clients SET lastseen=datetime(?, 'unixepoch') WHERE id=?]
2025-11-02 01:18:42: Active query(0): UPDATE clients SET lastseen=datetime(?, 'unixepoch') WHERE id=?
2025-11-02 01:18:42: Active query(1): UPDATE clients SET  file_ok=?, image_ok=?, alerts_next_check=?, alerts_state=? WHERE id=?
2025-11-02 01:18:42: Active query(2): UPDATE clients SET lastseen=datetime(?, 'unixepoch') WHERE id=?
2025-11-02 01:18:50: Backup of backup_server.db done.
2025-11-02 01:18:50: Starting database backup of backup_server_settings.db...
2025-11-02 01:18:50: Stop checkpointing of backup_server_settings.db...
2025-11-02 01:18:51: Stop writes to backup_server_settings.db...
2025-11-02 01:18:51: Copying backup_server_settings.db to /backups/urbackup...
2025-11-02 01:18:54: Backup of backup_server_settings.db done.
2025-11-02 01:18:54: Starting database backup of backup_server_files.db...
2025-11-02 01:18:54: Stop checkpointing of backup_server_files.db...
2025-11-02 01:18:55: Stop writes to backup_server_files.db...
2025-11-02 01:18:55: Copying backup_server_files.db to /backups/urbackup...
usermod: no changes
Raising nice-ceiling to 35 failed. (errno=1)
2025-11-02 08:08:28: Starting HTTP-Server on port 55414
2025-11-02 08:08:28: HTTP: Server started up successfully!
2025-11-02 08:08:28: SQLite: recovered 538 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2025-11-02 08:08:31: SQLite: recovered 99416 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2025-11-02 08:08:32: SQLite: recovered 9638 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2025-11-02 08:08:32: SQLite: recovered 59302 frames from WAL file /var/urbackup/backup_server_links.db-wal code: 283
2025-11-02 08:08:32: SQLite: recovered 7 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2025-11-02 08:08:32: SQLite: recovered 538 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2025-11-02 08:08:32: SQLite: recovered 7 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2025-11-02 08:08:33: SQLite: recovered 99416 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2025-11-02 08:08:34: SQLite: recovered 59302 frames from WAL file /var/urbackup/backup_server_links.db-wal code: 283
2025-11-02 08:08:34: SQLite: recovered 9638 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2025-11-02 08:08:34: Started UrBackup...

I dont know what is happening on that step except copying a damn single file.

I stop Urbackup and watch the memory this night again - and see if something changes…

Do you have the kernel log messages from the OOM? If yes, could you post them?

This night nothing happened - UrBackup was turned off by me since yesterday, all fine.

Yes, Here is the oom-relevant section, right after `2025-11-01 01:17:17: Copying backup_server_files.db to /backups/urbackup…` (Not from that try posted above but from a day before (01.11.2025).)

2025-11-01T01:20:20.447951+01:00 docker systemd[1]: cron.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.449954+01:00 docker systemd[1]: docker-b11b0abdc396f57b1f3d09399d929eadb0c37aa3532997ea12f488cf1fe747c5.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.450016+01:00 docker systemd[1]: docker-29203efa84da8ef270dfacbd17d5482ba14afe24817422faa80f2970701ecaa7.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.450071+01:00 docker systemd[1]: check-mk-agent-async.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.450110+01:00 docker systemd[1]: systemd-timesyncd.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.450891+01:00 docker systemd[1]: zfs-zed.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.450934+01:00 docker systemd[1]: upower.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.450977+01:00 docker systemd[1]: systemd-logind.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.451453+01:00 docker systemd[1]: systemd-networkd.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.451737+01:00 docker systemd[1]: polkit.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.451792+01:00 docker systemd[1]: thermald.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.451878+01:00 docker systemd[1]: systemd-resolved.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.451912+01:00 docker systemd[1]: smartmontools.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.451954+01:00 docker systemd[1]: rsyslog.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452234+01:00 docker systemd[1]: ModemManager.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452284+01:00 docker systemd[1]: udisks2.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452734+01:00 docker systemd[1]: docker-259320c78093b97322ad33ff9c205bbe0b62c511c9a850d48b98c1ef1d9d9bc5.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452770+01:00 docker systemd[1]: docker-cb66aa15e802eca56ca993d2658c147221b72346f6f092cc80aa89c8c5a00e28.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452806+01:00 docker systemd[1]: unattended-upgrades.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452849+01:00 docker systemd[1]: urbackupclientbackend.service: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452914+01:00 docker systemd[1]: docker-d7a582bdd6a8cc26e0c1cdeefdeafcd4fbc590029b0530a63333895612d4669c.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.452942+01:00 docker systemd[1]: docker-71930144c00416370a94a42f370b353996b4330cd47492b2d9089aa6669e0d4b.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453088+01:00 docker systemd[1]: docker-59b08c497f72ae96eefea60e231de868d7e4756112749ca8c7eaa303bd4d2d55.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453148+01:00 docker systemd[1]: docker-a4b44c5d431613ea7f394758fe0578a6d5ac8b53c999e9c4ac503bb53c46ad12.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453190+01:00 docker systemd[1]: docker-d97547ca08a7f0770c04ddbb6e92d6508f9dfb1b0177fcfddec5b8922ec0befd.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453255+01:00 docker systemd[1]: docker-f50f86cf1ff177a10280abbd0f100526296710765a9f493dbfa29240892d5e7f.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453290+01:00 docker systemd[1]: docker-fb7138b6ac1ad9c2f1c4519503dde3126a9f52ee60eb5f26d0159d41aa3f0a0b.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453324+01:00 docker systemd[1]: docker-de8f32a0439a9fc0eb2f6b5b53257c40b47cac79c30ff2bb9ddfd8da8fd03e64.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453356+01:00 docker systemd[1]: docker-e466246bc194da046eea8f9528cae819cf4360efc91d78652016e628a89c8c9e.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453495+01:00 docker systemd[1]: docker-2b2d04a9635e80eee4c8a9eff2f0cf4cf7f72536fd2bb7808e540d68dd1c8627.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453545+01:00 docker systemd[1]: docker-7ccb2e8e9cac2e9aea629891e9f797f459800924d93b46a99133bd47c09fc611.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453720+01:00 docker systemd[1]: docker-1e302719cb22919c55d9f11566731a63d2efe69810811d5f1fccc47002c2e199.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453785+01:00 docker systemd[1]: docker-9b4d2221bc8e614e39c610f55b86cb4c9b9b4c8957ff325fa22a6b88d20f853d.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453821+01:00 docker systemd[1]: docker-b435ffe137cb7cd0b12908ed88be5173e44cbc81f1936c1f764c233867499714.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453894+01:00 docker systemd[1]: docker-959610db54cc64e373bdd47fa283325ecb9faa1071c1fffd7a3cc1ccbc98900b.scope: A process of this unit has been killed by the OOM killer.
2025-11-01T01:20:20.453987+01:00 docker systemd[1]: systemd-networkd.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.454430+01:00 docker systemd[1]: systemd-networkd.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.454485+01:00 docker systemd[1]: systemd-networkd.service: Consumed 1.312s CPU time, 4.3M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.454576+01:00 docker systemd[1]: systemd-resolved.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.454612+01:00 docker systemd[1]: systemd-resolved.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.454709+01:00 docker systemd[1]: systemd-resolved.service: Consumed 4.060s CPU time, 8.2M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.454762+01:00 docker systemd[1]: systemd-timesyncd.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.454791+01:00 docker systemd[1]: systemd-timesyncd.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.454976+01:00 docker systemd[1]: check-mk-agent-async.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.455021+01:00 docker systemd[1]: check-mk-agent-async.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.455274+01:00 docker systemd[1]: check-mk-agent-async.service: Consumed 1min 42.198s CPU time, 91.3M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.455319+01:00 docker systemd[1]: polkit.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.455395+01:00 docker systemd[1]: polkit.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.455422+01:00 docker systemd[1]: polkit.service: Consumed 2.918s CPU time, 2.8M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.455454+01:00 docker systemd[1]: smartmontools.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.456185+01:00 docker systemd[1]: smartmontools.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.456226+01:00 docker systemd[1]: systemd-logind.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.456261+01:00 docker systemd[1]: systemd-logind.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.456391+01:00 docker systemd[1]: systemd-logind.service: Consumed 1.669s CPU time, 2.9M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.456760+01:00 docker systemd[1]: thermald.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.456796+01:00 docker systemd[1]: thermald.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.456829+01:00 docker systemd[1]: thermald.service: Consumed 5.412s CPU time, 6.8M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.456925+01:00 docker systemd[1]: udisks2.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.457034+01:00 docker systemd[1]: udisks2.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.457547+01:00 docker systemd[1]: udisks2.service: Consumed 17.323s CPU time, 9.7M memory peak, 0B memory swap peak.
2025-11-01T01:20:20.457656+01:00 docker systemd[1]: urbackupclientbackend.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.457687+01:00 docker systemd[1]: urbackupclientbackend.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.457719+01:00 docker systemd[1]: urbackupclientbackend.service: Consumed 5min 29.579s CPU time, 1.5G memory peak, 15.3M memory swap peak.
2025-11-01T01:20:20.458341+01:00 docker systemd[1]: zfs-zed.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.458383+01:00 docker systemd[1]: zfs-zed.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.458414+01:00 docker systemd[1]: rsyslog.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.458451+01:00 docker systemd[1]: rsyslog.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.458482+01:00 docker systemd[1]: unattended-upgrades.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.458509+01:00 docker systemd[1]: unattended-upgrades.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.458540+01:00 docker systemd[1]: ModemManager.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.458745+01:00 docker systemd[1]: ModemManager.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.458835+01:00 docker systemd[1]: cron.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.458872+01:00 docker systemd[1]: cron.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.458941+01:00 docker systemd[1]: upower.service: Main process exited, code=killed, status=9/KILL
2025-11-01T01:20:20.458967+01:00 docker systemd[1]: upower.service: Failed with result 'oom-kill'.
2025-11-01T01:20:20.459006+01:00 docker systemd[1]: docker-259320c78093b97322ad33ff9c205bbe0b62c511c9a850d48b98c1ef1d9d9bc5.scope: Deactivated successfully.
2025-11-01T01:20:20.459050+01:00 docker systemd[1]: docker-259320c78093b97322ad33ff9c205bbe0b62c511c9a850d48b98c1ef1d9d9bc5.scope: Consumed 6.054s CPU time.
2025-11-01T01:20:20.459133+01:00 docker systemd[1]: docker-7ccb2e8e9cac2e9aea629891e9f797f459800924d93b46a99133bd47c09fc611.scope: Deactivated successfully.
2025-11-01T01:20:20.459177+01:00 docker systemd[1]: docker-7ccb2e8e9cac2e9aea629891e9f797f459800924d93b46a99133bd47c09fc611.scope: Consumed 45.833s CPU time.
2025-11-01T01:20:20.459234+01:00 docker systemd[1]: docker-b11b0abdc396f57b1f3d09399d929eadb0c37aa3532997ea12f488cf1fe747c5.scope: Deactivated successfully.
2025-11-01T01:20:20.459262+01:00 docker systemd[1]: docker-b11b0abdc396f57b1f3d09399d929eadb0c37aa3532997ea12f488cf1fe747c5.scope: Consumed 1min 32.051s CPU time.
2025-11-01T01:20:20.459297+01:00 docker systemd[1]: docker-29203efa84da8ef270dfacbd17d5482ba14afe24817422faa80f2970701ecaa7.scope: Deactivated successfully.
2025-11-01T01:20:20.459417+01:00 docker systemd[1]: docker-29203efa84da8ef270dfacbd17d5482ba14afe24817422faa80f2970701ecaa7.scope: Consumed 4min 26.246s CPU time.
2025-11-01T01:20:20.459556+01:00 docker systemd[1]: docker-d7a582bdd6a8cc26e0c1cdeefdeafcd4fbc590029b0530a63333895612d4669c.scope: Deactivated successfully.
2025-11-01T01:20:20.459604+01:00 docker systemd[1]: docker-d7a582bdd6a8cc26e0c1cdeefdeafcd4fbc590029b0530a63333895612d4669c.scope: Consumed 1min 20.530s CPU time.
2025-11-01T01:20:20.459635+01:00 docker systemd[1]: docker-cb66aa15e802eca56ca993d2658c147221b72346f6f092cc80aa89c8c5a00e28.scope: Deactivated successfully.
2025-11-01T01:20:20.459670+01:00 docker systemd[1]: docker-cb66aa15e802eca56ca993d2658c147221b72346f6f092cc80aa89c8c5a00e28.scope: Consumed 6.080s CPU time.
2025-11-01T01:20:20.459732+01:00 docker systemd[1]: docker-59b08c497f72ae96eefea60e231de868d7e4756112749ca8c7eaa303bd4d2d55.scope: Deactivated successfully.
2025-11-01T01:20:20.459786+01:00 docker systemd[1]: docker-59b08c497f72ae96eefea60e231de868d7e4756112749ca8c7eaa303bd4d2d55.scope: Consumed 4min 11.616s CPU time.
2025-11-01T01:20:20.459850+01:00 docker systemd[1]: docker-f50f86cf1ff177a10280abbd0f100526296710765a9f493dbfa29240892d5e7f.scope: Deactivated successfully.
2025-11-01T01:20:20.459904+01:00 docker systemd[1]: docker-f50f86cf1ff177a10280abbd0f100526296710765a9f493dbfa29240892d5e7f.scope: Consumed 16.967s CPU time.
2025-11-01T01:20:20.459933+01:00 docker systemd[1]: docker-9b4d2221bc8e614e39c610f55b86cb4c9b9b4c8957ff325fa22a6b88d20f853d.scope: Deactivated successfully.
2025-11-01T01:20:20.460074+01:00 docker systemd[1]: docker-9b4d2221bc8e614e39c610f55b86cb4c9b9b4c8957ff325fa22a6b88d20f853d.scope: Consumed 1min 44.261s CPU time.
2025-11-01T01:20:20.460112+01:00 docker systemd[1]: systemd-networkd.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.460187+01:00 docker systemd[1]: systemd-resolved.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.460227+01:00 docker systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.460266+01:00 docker systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.460359+01:00 docker systemd[1]: thermald.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.460410+01:00 docker systemd[1]: zfs-zed.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.461705+01:00 docker systemd[1]: upower.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.461762+01:00 docker systemd[1]: cron.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.461798+01:00 docker systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 1.
2025-11-01T01:20:20.461837+01:00 docker systemd[1]: Started check-mk-agent@952-1134-999.service - Checkmk agent (PID 1134/UID 999).
2025-11-01T01:20:20.462441+01:00 docker systemd[1]: Started check-mk-agent@953-1134-999.service - Checkmk agent (PID 1134/UID 999).
2025-11-01T01:20:20.466727+01:00 docker systemd[1]: Started cron.service - Regular background program processing daemon.
2025-11-01T01:20:20.468474+01:00 docker systemd[1]: Starting modprobe@drm.service - Load Kernel Module drm...
2025-11-01T01:20:20.469099+01:00 docker systemd[1]: netplan-ovs-cleanup.service - OpenVSwitch configuration for cleanup was skipped because of an unmet condition check (ConditionFileIsExecutable=/usr/bin/ovs-vsctl).
2025-11-01T01:20:20.470167+01:00 docker dockerd[1463]: time="2025-11-01T01:20:20.463790572+01:00" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:45609" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:45609->127.0.0.53:53: read: connection refused" question=";imap.mailbox.org.\tIN\t AAAA"
2025-11-01T01:20:20.473433+01:00 docker systemd[1]: Starting rsyslog.service - System Logging Service...
2025-11-01T01:20:20.473513+01:00 docker dockerd[1463]: time="2025-11-01T01:20:20.473192196+01:00" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:60096" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:60096->127.0.0.53:53: read: connection refused" question=";imap.mailbox.org.\tIN\t A"
2025-11-01T01:20:20.473975+01:00 docker systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
2025-11-01T01:20:20.474087+01:00 docker systemd[1]: systemd-networkd-wait-online.service: Deactivated successfully.
2025-11-01T01:20:20.474585+01:00 docker systemd[1]: Stopped systemd-networkd-wait-online.service - Wait for Network to be Configured.
2025-11-01T01:20:20.475117+01:00 docker systemd[1]: Stopping systemd-networkd-wait-online.service - Wait for Network to be Configured...
2025-11-01T01:20:20.477299+01:00 docker dockerd[1463]: time="2025-11-01T01:20:20.477148079+01:00" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:44005" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:44005->127.0.0.53:53: read: connection refused" question=";imap.mailbox.org.\tIN\t A"
2025-11-01T01:20:20.477853+01:00 docker dockerd[1463]: time="2025-11-01T01:20:20.477177883+01:00" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:45227" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:45227->127.0.0.53:53: read: connection refused" question=";imap.mailbox.org.\tIN\t AAAA"
2025-11-01T01:20:20.478143+01:00 docker dockerd[1463]: time="2025-11-01T01:20:20.478058133+01:00" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:54534" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:54534->127.0.0.53:53: read: connection refused" question=";imap.mailbox.org.\tIN\t A"
2025-11-01T01:20:20.479684+01:00 docker systemd[1]: Starting systemd-networkd.service - Network Configuration...
2025-11-01T01:20:20.480978+01:00 docker dockerd[1463]: time="2025-11-01T01:20:20.480836964+01:00" level=error msg="[resolver] failed to query external DNS server" client-addr="udp:127.0.0.1:38864" dns-server="udp:127.0.0.53:53" error="read udp 127.0.0.1:38864->127.0.0.53:53: read: connection refused" question=";imap.mailbox.org.\tIN\t AAAA"
2025-11-01T01:20:20.481855+01:00 docker systemd[1]: Starting systemd-resolved.service - Network Name Resolution...
2025-11-01T01:20:20.486312+01:00 docker systemd[1]: Starting systemd-timesyncd.service - Network Time Synchronization...
2025-11-01T01:20:20.488958+01:00 docker systemd[1]: Starting thermald.service - Thermal Daemon Service...
2025-11-01T01:20:20.493610+01:00 docker systemd[1]: Starting upower.service - Daemon for power management...
2025-11-01T01:20:20.494858+01:00 docker systemd[1]: Started zfs-zed.service - ZFS Event Daemon (zed).
2025-11-01T01:20:20.504381+01:00 docker (cron)[685816]: cron.service: Referenced but unset environment variable evaluates to an empty string: EXTRA_OPTS
2025-11-01T01:20:20.510117+01:00 docker cron[685816]: (CRON) INFO (pidfile fd = 3)
2025-11-01T01:20:20.520167+01:00 docker cron[685816]: (CRON) INFO (Skipping @reboot jobs -- not system startup)
2025-11-01T01:20:20.522551+01:00 docker systemd[1]: sysstat-collect.service: Deactivated successfully.
2025-11-01T01:20:20.522605+01:00 docker systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
2025-11-01T01:20:20.536630+01:00 docker systemd[1]: modprobe@drm.service: Deactivated successfully.
2025-11-01T01:20:20.536871+01:00 docker zed[685825]: ZFS Event Daemon 2.2.2-0ubuntu9.4 (PID 685825)
2025-11-01T01:20:20.537958+01:00 docker systemd[1]: Finished modprobe@drm.service - Load Kernel Module drm.
2025-11-01T01:20:20.538796+01:00 docker zed[685825]: Processing events since eid=5

Nothing special in that block. Those look all the same.

Note: `docker-abcdefghij…` is not referring to docker containers, the hostname of this machine is “docker”, there fore the scope prefix.

If I could diagnose the container process somehow at that point I would be more than happy to help.

RAM usage of the whole machine for the last 8 days (NOTE: No spikes shown, as CheckMK is too slow to capture the spike. This must happen in few seconds only):

I enabled debug-log mode, maybe it shows more the next time.

Can I manually start the cleanup-window to speed up the diagnosis?

Interenstingly this night no oom-killer was invoked - but there is still a ram spike:

The log:

2025-11-04 01:07:15: Checking integrity of backup_server.db
2025-11-04 01:07:16: Checking integrity of backup_server_settings.db
2025-11-04 01:07:16: Checking integrity of backup_server_files.db
2025-11-04 01:07:32: Checking integrity of backup_server_links.db
2025-11-04 01:07:34: Checking integrity of backup_server_link_journal.db
2025-11-04 01:07:34: Starting database backup of backup_server.db...
2025-11-04 01:07:34: Stop checkpointing of backup_server.db...
2025-11-04 01:07:35: Stop writes to backup_server.db...
2025-11-04 01:07:35: Copying backup_server.db to /backups/urbackup...
2025-11-04 01:07:42: Lost channel connection to Albrecht-NB. has_error=false
2025-11-04 01:08:41: Backup of backup_server.db done.
2025-11-04 01:08:41: Starting database backup of backup_server_settings.db...
2025-11-04 01:08:41: Stop checkpointing of backup_server_settings.db...
2025-11-04 01:08:42: Stop writes to backup_server_settings.db...
2025-11-04 01:08:42: Copying backup_server_settings.db to /backups/urbackup...
2025-11-04 01:08:49: Backup of backup_server_settings.db done.
2025-11-04 01:08:49: Starting database backup of backup_server_files.db...
2025-11-04 01:08:49: Stop checkpointing of backup_server_files.db...
2025-11-04 01:08:50: Stop writes to backup_server_files.db...
2025-11-04 01:08:50: Copying backup_server_files.db to /backups/urbackup...
2025-11-04 01:09:03: Lost channel connection to Albrecht-NB. has_error=false
2025-11-04 01:10:25: Lost channel connection to Albrecht-NB. has_error=false
2025-11-04 01:10:26: Resetting channel to Albrecht-NB because of timeout.
2025-11-04 01:10:26: Connecting to target service...
2025-11-04 01:10:36: Connecting on internet connection failed. Service=0
2025-11-04 01:10:36: Establishing internet connection failed. Service=0
2025-11-04 01:10:36: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:10:42: Establish timeout: Deleting internet client "Albrecht-NB"
2025-11-04 01:10:43: Backup of backup_server_files.db done.
2025-11-04 01:10:43: Starting database backup of backup_server_links.db...
2025-11-04 01:10:43: Stop checkpointing of backup_server_links.db...
2025-11-04 01:10:44: Stop writes to backup_server_links.db...
2025-11-04 01:10:44: Copying backup_server_links.db to /backups/urbackup...
2025-11-04 01:10:46: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:10:56: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:11:06: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:11:16: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:11:26: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:11:36: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:11:46: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:11:56: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:06: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:16: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:26: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:36: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:46: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:48: Backup of backup_server_links.db done.
2025-11-04 01:12:48: Starting database backup of backup_server_link_journal.db...
2025-11-04 01:12:48: Stop checkpointing of backup_server_link_journal.db...
2025-11-04 01:12:49: Stop writes to backup_server_link_journal.db...
2025-11-04 01:12:49: Copying backup_server_link_journal.db to /backups/urbackup...
2025-11-04 01:12:56: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:12:57: Backup of backup_server_link_journal.db done.
2025-11-04 01:13:06: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:13:16: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:13:26: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:13:36: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:13:46: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:13:56: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:14:06: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:14:16: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:14:26: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:14:36: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:14:46: Connecting Channel to Albrecht-NB failed - CONNECT error -55 require_reauth=false
2025-11-04 01:14:53: Client exited: Albrecht-NB
2025-11-04 01:14:53: Waiting for backup threads to finish (clientid=23, nthreads=0)...
2025-11-04 01:14:55: Stopping channel...
2025-11-04 01:14:56: client_main Thread for client Albrecht-NB finished
2025-11-04 01:15:43: Client finished: Albrecht-NB

So, nothing special here? I will monitor the next night. Still no clue why this spikes happens at all.

The main OOM log is in kern.log or dmesg at runtime.

Will post them when it happens the next time.

Since the last oom event, the oom was not triggered, but I still have those spikes. During cleanup windows and I also noticed them during a client backup today:

Time in question:

2025-11-05 08:50:54: LMDB: PUT clientid=29 filesize=2684 hash=wgquLz+DE8UaG0OF6Eiqrg== target=9900021
2025-11-05 08:50:54: LMDB: PUT clientid=29 filesize=2845 hash=1Iumi34caBfj6r4BUwiuyw== target=9900015
2025-11-05 08:50:54: LMDB: PUT clientid=29 filesize=6556 hash=1lODCO62Z6l+se+Ff6vLQw== target=9896671
2025-11-05 08:50:54: LMDB: PUT clientid=29 filesize=2216 hash=3SgRe7EBFRtR4uF1xz0dog== target=9895518
2025-11-05 08:50:54: LMDB: PUT clientid=29 filesize=2200 hash=8Wd1vpi9hwoQE8FzDWamBA== target=9895535
2025-11-05 08:50:55: Syncing file system...
2025-11-05 08:51:27: Passive WAL checkpoint of urbackup/backup_server_files.db completed busy=0 checkpointed=46377 log=46377
2025-11-05 08:54:20: Creating symbolic links. -1
2025-11-05 08:54:20: Creating symbolic links. -2
2025-11-05 08:54:20: Symbolic links created.
2025-11-05 08:54:20: Encryption overhead: 79.9219 KB
2025-11-05 08:54:20: Encryption overhead: 99.3838 KB
2025-11-05 08:54:20: Transferred 359.428 MB - Average speed: 32.925 MBit/s

Any ideas? Is that RAM spike at your end present as well? Is it “normal”?

I collect some memory monitoring data and see what it shows over the day…

It happened again this night, but so hard, the system locked up. no keyboard keys were accepeted, but the cursor blinks. Interesting.

syslog simply stopped logging, no OOM logs. But a memory spike. and the last words from Urbackup:

2025-11-06 01:44:15: Copying backup_server_files.db to /backups/urbackup…

So - how can I help “debugging” this? Isnt this just a simple copy command at this point? How does it come to such a high memory usage spike?

If nothing helps, I try to limit the docker container memory resources.

Very weird.

I like to use heaptrack for this. Should be easy to use on e.g. Debian (apt install heaptrack).

Then run

heaptrack urbackupsrv run

Afterwards e.g. look at it with heaptrack gui.

It needs to have the urbackupsrv debug symbols. Either you need to compile it yourself or install the -dbg package as well (if that currently works with heaptrack, idk).

Obviously it only tracks urbackupsrv memory usage. If it is ZFS or something it won’t notice.

1 Like

Thanks. Will try

is the debug version as docker container available?

Current state:

Its not UrBackup. And Iam feeling dumb posting this in the first place. The Container memory usage never spiked. So, no urbackup application was ever the cause. But I was misled by the cleanup time window and the UrBackup log, as the timing was 100% accurate.

After many helper scripts and discussions with ChatGPT: ZFS ARC seems to be the issue. I adjusted zfs_arc_max early after the first issue but somehow ZFS keeps growing too fast at the db copy point.

I have no idea why my 4GB DB file can cause this. ChatGPT justified this because of my recordsize being 1M for my backup dataset (in which the cleanup → copy-database job copies the database.)

However, because the /var/urbackup violume gets backuped either way, I disabled the Automatically backup UrBackup database: for now.

Dont know if it helps, I will see.