Hello all,
I’m running the latest UrBackup Server on Windows.
My backup dir is a 20TB drive.
UrBackup interface shows I’m using 12.8TB, while the 20TB disk is almost full.
In an attempt to resolve this, I have started the remove_unknown.bat, cleanup_database,bat,cleanup.bat and I have recalculated the statistics from the web ui.
But still the statistics are way off.
Any ideas how to remove the extra files/dirs or how to fix the statistics to show the correct numbers?
I have very similar problem, but with Linux on ext4/Raspberry Pi with OpenMediaVault and Docker. The difference in my case is with .directory_pool directory - it isn’t calculated in Urbackup statistics, and it is HUGE. In my case I think it started growing after deletion of some old incremental backups. Somebody suggested that there can be a problem with deduplication, but I have no idea how to check it. I also tried “remove unknown” command, but to no avail. I’m currently upgrading my server to BTRFS, and I will check what can be done using deduplication in BTRFS.
I had to delete some incremental backups using the ‘Delete now’ option in the GUI.
I believe the issues started afterwards.
I’m not sure that the ‘Delete now’ did its job
I’m going to delete all backups now to try solve this one, but I’m afraid this might not me possible solution in the future.
So still looking for the root cause and a proper solution to this problem
This may be off topic, but my suggestion for you on Windows systems would be to use ReFS instead of NTFS on storage disks. Since the ReFS system creates the file table after verification, it does not require a chkdsk process. CHKDSK is a very time-consuming process.
So, what really is inside this .directory_pool directory ? As far as I know there are files that are constant (not changed) between different incremental backups. So, why they aren’t included in statistics ? Also some of my full backups are unrealistically small - about 10 GB - and it should be about 1 TB judging by the size of my PC which is being backupped to te server. Also, if .directory_pool is NOT those constant files, why they aren’t properly deleted by Urbackup after backup ? I want to keep all backup timeline since last two years, so the solution with deleting all backups is no good for me, so I’m looking for “proper” solution to this problem.
Am sure the answer will be in the logs for the backup/s in question, but from my experience, the server (my instance is Linux, but don’t think that’s relevant here) will be doing some de-duplication wizardry, and the log will have “80 GB already exists on server” type message at/near the end…
You will have to make sure that “info” is the log filter drop-down option, as that doesn’t seem to be the default…
EDIT: Adding screenshot from one of my client machines…
Thank you @OnlyMe , I checked the log and exactly as you said there is information about large number of files being copied from last backup. This leaves me wondering what will happen with current backups if previous one are deleted ? I was thinking that in full backup all files are copied. This means that full and incremental backups are nearly the same (since in both cases only new files are transferred and backupped) ?
Can answer to the best of my understanding, but I’m not the dev…
The simple explanation of my understanding is that the server basically has a “table” of the files, and a new backup gets tagged with the existing files and adds anything new (for this example, a changed file is a new version of the original file, so counts as new) to the table…this is de-duplication, in simple terms…
When you delete a backup, it’s removing the “column” in the table for that particular job run, and the server will clear anything that was only referenced against that specific job run…should there be no other references for a file, then the file will also be binned off…
Have massively over simplified it, but I’m tired and have an early start for work tomorrow, and am actually going to be in the office…
Hopefully my example above makes sense, all files are listed in the backup, but to save both space and bandwidth (especially if traversing the internet) it does de-duplication as part of the “indexing” phase [or the transfer of the new backup’s index, maybe] and tags the existing file/s against the new job run…
On a full backup all files are copied and re-hashed on the server but if they’re identical to the last backed up version they’re deleted to deduplicate, so the backup size in the activity log only mentions the amount that changed just like for an incremental backup.
As mentioned the state at each backup is tracked by the database.
Thank you @OnlyMe and @kilrah for good explanation, this really makes sense and it made clear for why the size of backups doesn’t change as I expected. However, I still don’t understand why the .directory_pool is very big, ist’t contained in data size statistics, and why remove_unknown doesn’t cleanup it. As far as I know, if it is big and important, it should be taken into consideration in statistics. If it is big and not important, it should be cleaned by remove_unknown script. Also, I noticed that Rsync of the backup folder from one disk to another (ext4 to BTRFS) (with -a, -W and -H flags, so archive, whole files and preserve hardlinks) changed size of the directory - from 6.4 to 6.6 TiB. I wonder if it has common cause with either difference in size between Urbackup statistics and file system size, or if it is another problem. I’m afraid that my copy can be broken in some strange way during copy (if it is bigger than source, this means that there is some difference !) and I have no idea how to check data integrity after that.
Most likely the discrepancy is simply due to filesystem overhead. Backups are typically many millions of files and depending on the filesystem that can cause a lot of waste. E.g these 2 drives have exactly the same contents but they were formatted with different block sizes causing half a TB of difference for just ~1.5M files.
urbackup is going to tell you it backed up X worth of files, but the filesystem they’re on takes Y to store those because it’s inefficient.
To me it would make no sense backing up the data of an urbackup server, it’s mostly unusable without the server. Run a 2nd server instead if you really want 2.
I’m using Rsync for two reasons. First, I backup a backup to another hard drive - poor man’s version of Raid 1 - on my Raspberry Pi based server, since HDDs connected through USB-SATA adapters work poorly with RAID arrays. Second, I want to copy backup server from 8 TB to 20 TB HDD, since 8 TB (or 7,2 TiB) HDD is nearly full and I want more continuous backup history (to have backups since beginning of the Urbackup server). And since I ran into similar problem to @alexo but to lesser extent (I also started this topic link, but got little response) I’m looking for the cause of such growth of .directory_pool directory on backup drive. The proposed solution (deletion of all backups) is unacceptable for me. Maybe the problem is unique to Docker-based Urbackup server (similarly to problem with mounting image backups) ? I found very little information about it in internet. Maybe @uroni can tell why using “delete now” option (or just running server for long time, I’m not sure what started the difference of size between Urbackup and filesystem) creates such problems ?
Edit: I tried to run Urbackup on new BTRFS formatted disk, after Rsync of all data from previous ext4 based HDD, and some of backups are broken - I cannot access files through web interface. I’m going to run Rsync one more time (maybe it was interrupted and not all files were copied)