Nightly Clean Up going on 2 days

Ted_Clark · December 2, 2020, 3:08am

37 clients, almost 14Tb data on statistics. It’s been doing Nightly for almost 2 days straight. Windows 10 for the server.
MegaRaid Storage Manager reports everything good with the RAID. Activity status shows 100% usage on the virtual drive.

Clients continue to backup without issue, except the obvious…

Thoughts?
Edit: Eset antivirus is excluded from scanning the storage.

Edit #3
Are my incremental too many?
Capture

Don_Wright · December 3, 2020, 12:17pm

Just a quick check: is your Eset antivirus excluding the UrBackup program and temporary folders as well as the main storage? The live database is stored in C:\Program Files\UrBackupServer\urbackup under Windows. These files are updated frequently during operations. I also set the Nondefault temporary file directory to a faster hard drive which is not the system drive to maximize performance.

Not sure if it applies to you, but the Avago MegaRAID cards I’m using aren’t terribly fast. The lower models don’t support an SSD cache, for example, and I’ve found Storage Spaces won’t pair an SSD with a RAID card for tiered storage.

orogor · December 3, 2020, 3:48pm

If you backup as file and not as image, i am pretty sure this is dues to index.
Storing indexes on ssd would helps a lot.

they are faster
the access pattern of random, small i/o on indexes, is the worst for mechanical drives.

As i remember, some indexes would also benefit being splitted by date and/or client.
So that they can simply be dropped when their associated backups/clients are deleted.
But this require something on the dev side, maybe @uroni will include that in the beta.
No idea on the complexity.

Ted_Clark · December 3, 2020, 5:14pm

@Don_Wright I did not exclude the temp folder on the C: drive, but it’s a SSD and showing no slow down while the HDD raid is bogged. I’ll exclude now. My raid card is a LSI SAS9211-8i, not a cheap card. The battery was old, not using it anymore so the card isn’t as fast as it could be, but it’s been like that for years. Not positive but I think I need to have the battery to enable SSD cache, but I can try throwing in an old SSD next time I’m shutting it down. Thanks for the reply.

@orogor I’ve not sure what you mean about indexes. The program is located on a 1TB SSD, the data resides on HDD’s.

Ted_Clark · December 18, 2020, 9:53pm

I think I have too many incremental backups. I’ve reduced the amount to 30. Nightly is taking a week to finish.

aj_potc · December 19, 2020, 2:01am

These large cleanups can really take a long time, especially if there are many files/clients involved. I’ve seen this process take a couple of days for 2 TB worth of cleanups (many small files) when just a single large client is removed.

I’d suggest shutting down the UrBackup server and running urbackupsrv remove-unknown or urbackupsrv cleanup -a 0%. These will run faster if the server is offline.

It sounds like you’ve already got the UrBackup database running on flash-based storage, so that’s about as good as you’ll get.

Ted_Clark · December 19, 2020, 4:32am

Thanks, I’ll give it a go.

Ted_Clark · December 21, 2020, 7:28am

Still going 2 days later. I’m going to have to put the server back online soon or clients will start seeing red.

aj_potc · December 22, 2020, 2:07am

Unfortunately I don’t know of another way to speed it up.

You mentioned reducing the number of incremental backups retained to 30. Was this previously a much higher number? If so, then the initial cleanup will be painful, but afterwards it should return to normal.

You may need to interrupt the cleanup so you can start UrBackup again. At least that will allow your clients to get in another incremental backup.

The holidays may be a good time to shut it down again and resume the offline cleanup. Patience will eventually pay off.

Ted_Clark · December 22, 2020, 5:24am

Yes I increased the count, figured “why not” when I had so much space on a fast raid. Now I know why not. I did have to break out of the cleaning as clients were red. I’ll just let it run this week, hopefully get back to normal. Thanks for the reply.

aj_potc · December 22, 2020, 3:24pm

I’ve made the same mistake myself. Fast RAID array on a modern dedicated server. I set UrBackup to retain up to 300 daily incrementals for a client that has 20+ million files. Since the files don’t change, I figured “no big deal”. But it means that, every day, the file system and database are updated with millions of new entries. Over time this results in a massive, unwieldy database and billions of hard links on disk, making an eventual cleanup super long despite the server’s fast IO.

uroni · December 22, 2020, 6:04pm

I would guess it doesn’t much depend on the number of backups but on the number of files in each backup and the number of backups per day. If you have large number of max incremental backups it just takes a while for the problem to become apparent (the problem being that storage IOPS are too low for both backups + deletion – hard disks are really slow and useless for random IO).

Of course there are more and less efficient options… it can use the directory links more efficiently on Windows, but NTFS hardlinks are less efficient. Then NTFS has stuff like the change journal… maybe disable that?

Most efficient storage option (i.e. storage IOPS go further) is btrfs+Linux.

Ted_Clark · December 22, 2020, 7:27pm

Capture

Apparently not enabled? I didn’t modify it.

Chad_Neeper · July 30, 2021, 2:52pm

This is an old thread, but I just stumbled across it and wanted to toss in my 2 cents:

I’ve been running UrBackup for several years in a similar scenario: Windows Server, ~20 clients, 30TB RAID. With that much data (and, in my case, ancient repurposed slower hardware which augments data manipulation and storage problems), things get slow and so tweaking the configuration and observing results has been something I’ve found myself doing over months and years. I had the same experience with file backups. Backing up lots of files results in the behavior described by the OP and Uroni is on the right track. There’s lots of extra stress/thrashing on the UrBackup database and there’s lots of extra stress/thrashing on the storage device, specifically with the file system. I don’t claim to understand fully and have since forgotten the exact details from when I researched how a file gets deleted from a file system, but there is a read/write penalty for deleting a file. It’s not an issue with a single or small number of files, but adds up when deleting thousands of files. It’s many orders of magnitude faster to delete a single huge file (image backup) from a file system than it is to delete thousands of smaller files (file backup) that all total the same amount of stored data as the single huge file. Plus the indexing, etc of all of those individual files.

You can observe the effect of this outside of UrBackup by simply creating a directory with thousands of files in it and time how long it takes to delete all of those files. Compare it against the time it takes to delete a single file of the same approximate size as the total of all the individual files. The single large file should be significantly faster to delete than the thousands of smaller files.

What I’ve done is to reduce the number of individual files I’m backing up by leveraging image backups and file backups in combination. I do image backups, and then supplement with file backups of only the data directories. The more (hundreds of thousands of) individual files you have backed up, the slower things are going to be during maintenance, indexing, etc. The goal is to minimize the number of individual files getting backed up.