Backup Server nearly crashed the third time - All RAM used - Urbackup Server not working anymore

Hi,

It’s the third time within a few weeks when my backup server nearly crashed because all RAM was eaten up by UrBackup Server or underlying processes. RamMap is showing either ProcessPrivate (UrBackup Server) or Metafile (or both) using all 16GB of RAM, causing the server to nearly hang. I was not able to the stop the UrBackup Server service, I had to kill the process and rebooted the machine. After restarting the service took hours until it switched from “Starting” to “Started”, but the Backup Server is still not working, alle RAM is in use again,the Webinterface is down and logs are showing tons of errors related to hardlinks and failed database queries.
The Backup Server is a Windows Server 2008R2 machine with 36 x 1 TB disks (Raid 6), SSD buffered, 16GB of Ram and and a 4 core Xeon CPU.

My target was to incrementally backup about 100 clients once a day (or even better every 4 hours) with full backups every 90 days, with a few clients containing up to 2 million files (I guess these are the ones causing the server crashes). Even when the server was working I did not even get near to reaching my backup target, I was able to backup about 5 to 15 clients per day with 4 simultaneous jobs running, running more jobs was a bad idea and overloaded the server, nonetheless i would need about 50 simultaneous jobs.
So I’m wondering how to get my server back into a working state and what kind of hardware is necessary to reach my backup target without investing tens of thousands of Euros into hardware.
I attached the last log and crash dumps please me know if you need more info.

Thanks,
Peter

See here for some tips: http://blog.urbackup.org/177/performance-considerations-for-larger-urbackup-server-instances
So if C:\ isn’t on a SSD that would be the first step.

If memory problems persist after you restart the server process the problem usually is somewhere else. Others have for example said a driver caused large Metafile usage.
In your case it could also be Sophos causing this. This one should be deinstalled from the server or at least active scanning disabled as it can cause backups to fail (and performance issues).

The warnings are normal on NTFS as it has a limit of 1024 hard links per file.

Can SQLite even handle that sort of workload? Even with it being on SSDs?

Is there any possibility of getting something like PostgreSQL as an alternative database on the roadmap? Or perhaps any other RDBMS? As people start to scale out their UrBackup deployments, this seems like an area that will become a bottleneck.

SQLite uses about the same data structures as PostgreSQL (WAL + b-tree indices). So I’d say that it would not improve performance much in most cases.
And then you’d have to start tuning PostgreSQL whereas now it pretty much tunes itself.
Plus the current development is that storage is becoming fast and the CPU becomes the bottleneck again, for example M.2 SSDs are really fast and cheap.
Plus we should wait and see how much the improvements in UrBackup 2.0 alleviate the database bottleneck problems.