Backup's "Used Storage" much larger than data transferred

In testing UrBackup on a number of clients, I’m finding that the Used Storage column under Activities is frequently much larger than the amount of data reported transferred in the backup logs.

For example, one client log shows the following at the end of the backup session:

Transferred 701.586 MB - Average speed: 41.674 MBit/s
(Before compression: 1.54032 GB ratio: 2.24818)

To me, this appears as though I just sent 1.5 GB of new data to the server. However, if I find this backup under the Activities list, I see the following amount reported under Used Storage:
40.38 GB

Why do the two numbers differ so much? Does this mean that the new backup is really occupying 40 GB of additional storage on the server – even after deduplication is applied? How can we account for the enormous difference?

Thanks for any tips.

Hi aj_potc,

is this a full backup (new backup) or incremental?

if its an incremental its possible that one or a couple of the files forming part of the backup set are database type files in which case only the changes are transferred during the backup however the file is then reconstructed on the server side and a full copy of the file is saved under the dated backup folder.

also symbolic links are used to the last backup of unchanged files so in windows when checking the properties of the folder it shows the complete current data set including non changed files

probably one of the two is occurring

Thanks very much for the reply.

These are all incremental backups being sent to a Linux server.

I can understand that only small parts of the changed files might be transferred (kind of like rsync does). However, I’m having a terrible time figuring out why there are 30-40GB of changes for each and every backup.

Do you know of any way i can pinpoint the differences? Unfortunately the UrBackup info logging doesn’t seem to capture this. Perhaps I can compare the backup directories to one another?

Thanks again for your insight.

Just to update, I’ve tried finding the difference between two of the backup directories to see if that will show which files are changing.

I used the following commands:
diff --brief -Nr dir1/ dir2/
and
rsync --recursive --delete --links --checksum --verbose --dry-run dir1/ dir2/

Both of these return a list of changed files, but the size of these files comes nowhere close to the amount shown as Used Storage on the Activities page (perhaps 1 or 2 GB, but not 30 GB).

When I look at the Statistics page, it appears that the amount of disk space used by the client is also increasing by the same rate, which is quite concerning. I’ve now got two clients that are consuming 100 GB of extra space per day. I’m only backing up relatively small files that change frequently, so this can’t account for the changes.

Is there anything else I can check to look for why the changes are so large?

Hi aj_potc,

Please check the content which is been backed up, check whether there is any microsoft outlook related files.

The issue is that if a file size of 1GB and any changes done inside that file which leads the file size to 2GB then the urbackup will take whole 2 GB because it will become like a new file as far as urbackup software is concern.

This is what i observed

Hi shaz619,

Yes, I did observe this behavior for some databases and browser/email cache files on other clients, but I was able to fix it by excluding them from the backups. For the troublesome clients, I’m not backing up similar material – most of it changes no more than once a day, and yet still the incrementals are reporting 30+GB under the Activities page.

I’m concerned that something is corrupted, and that I’m not seeing correct numbers. I can’t figure out the connection between the amount reported backed up in the client logs, the Used Storage on the Activities page, the info on the Statistics page for each client, and what I can actually see on disk.

Is it your experience that the amount reported under Used Storage for each backup is actually how much extra space is being occupied on disk (after deduplication)?

Hello

Either don’t bother too much or spend a lot of time to understand the metrics reported by urbackup, You have things like :

  • backup speed (source content read speed+comparison, plsu send only hashes or full file if new)
  • traditional transfer speed (network),
  • client used space accounting for virtual client space
  • some places showing up the deduped size and some the original content size, or the backuped size.

On thing you can do is split the backup using virtual clients and make one backup set for thing that change often and another for things that don’t change too often.

For things like logs, which are append only, or vm where only specifics parts are changed you can use hashed transfert (default in internet mode). Something like disk defrag or a dababase vacuum which would rewrite a large file would break hashed file transfert logic. Also hashed file is more cpu intensive, that’s why it not default.

Hi orogor,

Thanks for your reply.

All of my clients are Internet clients, and I can see that UrBackup is sending only the deltas for each incremental, so this amount may be very small – much smaller than the total amount of changed files.

However, I’ve excluded all of the large, quickly changing files that I can think of. Still, I’m seeing large numbers reported on the Activities page under Used Storage, while the on-disk difference (as shown by du) is much smaller.

I just can’t find any justification for which files are supposedly changing so much between backups. I’ve examined all the logs I can, but nothing points to the answer. The really frustrating thing is that I’m having this issue with several clients.

Perhaps you’re correct that I should forget about those numbers. However, I’m very concerned that on the Statistics page I can see the amount consumed by each backup steadily increasing day by day, with no method to see a comparison of what’s changing.

Hello

I get the same behavior, don’t worry about it.

There’s the delete unknown script and the recompute statistics button if you fear there’s a corruption , but i doubt there is.

If you understand stuff like difference between du and df, sparse files, symlink and hardlink, plus take an external tools to get network usage, then spend >1 day on it, you ll understand and worry less about it.

Maybe @uroni can add some additional explanations both within the gui and inside the manual t help understand how statistics are computed, because this question pop up on the forums regularly.

Check your actual used space/free space on your storage server. If it is increasing by that amount (or something close) then I would look deeper. If not, it may be a problem like what I had under a Windows server where the symlinks were being counted as full files and the space they symbolically represented were showing up under windows as used space… it was weird, but didn’t really represent a threat. I just had to use a different utility that understood symlinks to run my space checks.

Thanks for your reply. I’m running Linux as the server, so I’m pretty confident that the system utilities are reporting the correct numbers.

I have a very odd situation where the disk space is not increasing by as much as UrBackup is reporting, but it’s still increasing noticeably. I had tried restarting backups from one of two systems that contains a lot of duplicate files, but it appears that UrBackup’s deduplication has gotten broken somehow, because the storage usage for the new backups makes it look as if no deduplication is happening.

In addition to that, my statistics page is now reporting wildly inaccurate numbers. As of the last reading yesterday, my backups were occupying -500GB (yes, that’s a negative number). Today I’m seeing a more realistic value, but it’s still well below what’s actually on disk for each client.

At this point I’m thinking of starting over from scratch. Clearly something is not right. But it gives me no confidence in UrBackup if it can’t survive just one week backing up 5 clients. I get the feeling from this forum that what I’m doing is really tiny compared to what others are backing up.

Thanks for your reply. I’m running Linux as the server, so I’m pretty confident that the system utilities are reporting the correct numbers.

  • Yes, but that you interpret them correctly is a different thing, hence my question if you knew why du and df are reporting different values (they are, it s normal)

I have a very odd situation where the disk space is not increasing by as much as UrBackup is reporting, but it’s still increasing noticeably. I had tried restarting backups from one of two systems that contains a lot of duplicate files, but it appears that UrBackup’s deduplication has gotten broken somehow, because the storage usage for the new backups makes it look as if no deduplication is happening.

  • Yes it depends on which page you look at, i can’t remember for everything, but basically each page has its own logic on computing backups sizes

In addition to that, my statistics page is now reporting wildly inaccurate numbers. As of the last reading yesterday, my backups were occupying -500GB (yes, that’s a negative number). Today I’m seeing a more realistic value, but it’s still well below what’s actually on disk for each client.

  • Yes that s normal, in my opinion this page i actually quite broken. specially for virtual clients.
  • You end up with negative sizes because of dedup and a backup or client was deleted.
  • Some part of the UI show the backuped size(like status page) some other pages, like stats page tries to compute on disk usage, but because of dedup and because you can delete backup, the “which file accounts to which backup/client” logic get thrown off and you end up with negative size.

At this point I’m thinking of starting over from scratch. Clearly something is not right. But it gives me no confidence in UrBackup if it can’t survive just one week backing up 5 clients. I get the feeling from this forum that what I’m doing is really tiny compared to what others are backing up.

  • As i said before, do not worry, as i noticed, it s just the number for reported size that get wrong.
  • What i would suggest is that you do a few backup/restore tests, and if that works, that’s the most important.

Hi orogor,

I realize that the space reported by du can differ because of the presence of hard links. But df is showing me that the space used by the backups is far larger than it should be. It’s not normal when my initial backup consumed 1.8TB, and only a week later df is showing me 4TB used. I don’t have that many changing files – nowhere even close. I fear that the deduplication has failed, and the system is now storing file duplicates without maintaining the proper hard links. Either the files/symlinks on disk are corrupted, or the database is – at least that’s my impression.

I’ve now wiped out the backups from the two largest clients and am letting them start again. I didn’t completely start from scratch – but maybe this will help.

Just to update this thread, in case it will be helpful to someone:

My idea to manually delete the backups of the two largest clients on the backup server appears to have been successful. After restarting the backups of these clients, UrBackup’s hash database was regenerated, and now the deduplication seems to work properly. The size of my backups, as reported by Linux disk utilities as well as on the Statistics page, is now rising in a linear and predictable way.

UrBackup doesn’t have any way I’ve found to do a “hard reset”, in which you tell it to throw out what it knows about a client and start over from scratch.

The “fix” was achieved this way:

  1. Shutdown the UrBackup server.

  2. Rename the directories containing the backups of each client. This has the same effect as deleting them, as UrBackup will no longer be able to find them.

  3. Run UrBackup’s cleanup command:
    /usr/local/sbin/urbackupsrv remove-unknown
    This will remove all entries in the database related to files that no longer exist on disk.

  4. Start UrBackup server. It will automatically start full bacukps for each of the missing clients and rebuild its hash database.

  5. Delete the renamed directories containing the old backups. In my case, this took a lot of time (there were millions of files).

I can’t explain what happened to UrBackup, but it does appear that the cause of my problem was a corruption or fault in either its database or the on-disk file linking. It wasn’t just a case of the Statistics page or other metrics being reported incorrectly (though that did contribute to my confusion about what was going on).

If you have a situation where the individual incremental backup sizes don’t seem to make sense, or where the disk space consumed doesn’t appear to be taking advantage of deduplication, you may want to try this fix.

1 Like