Backup Policy & Behaviour

davidh1968 · July 13, 2020, 5:49am

Hi all.

Have moved from another backup solution to urbackup and must say first impressions are excellent indeed.

Looking for some advice. I have 11Tb of backup space in my /urbackup_data directory.

I have a fileserver that I backup amongst other clients, but this file server has around 7Tb of file that don’t change often, incremental additions only. So, a “full file backup” is 7Tb.

I have set up as below;

Interval for incremental file backups = 24 hours
Interval for full file backups = 30 days
Maximum number of incremental file backups = 20
Minimum number of incremental file backups = 10
Maximum number of full file backups = 1
Minimal number of full file backups = 1

The bhaviour I want to to create a baseline ( full backup ), then have incremental backups follow the rolling window as above.

30 days later I want the “full file backup” data to be updated so it’s effectively a new baseline. My fear is that in 30 days the server wil try and write another 7Tb of data.

This duplication of the 7Tb obviously I don’t have the space for. When I manually run a second “full file backup” it seems to be backing all the files up again…

Is the behaviour in my confiuguration above that the second “full file backup” ( if I had the space… ) would be written, then during the cleanup process the incrementals removed and the first “full file backup” removed also?

What would be the best confuration for what I need?

David

amazing · July 13, 2020, 7:54am

The full and incremental backups in urbackup refer to the files scanned on the client. All backups contain all files but the de-duplication results in only a single copy of a file which then appears in each backup using file system features.

This means that when the second full backup is taken the extra space consumed is only the changed files, the unchanged files are obtained from earlier backups, either the initial full backup or an incremental backup taken after the file changed.

All backups, incremental and full, contain all files but with the magic of file de-duplication they all fit if not many changes occur.

An incremental can miss a changed file if the client does not alter meta data such as time of last modification. That is why the occasional full backup can be needed.

The full backup causes every file to be examined as a candidate for backup and will transfer the file for backup if required.

The administration manual does cover all of this but it seems to be the single biggest misunderstanding that everyone struggles with.

Maybe it would be better to think of backups as being “efficient” and “exhaustive” instead of the names that people use for the old multi-cycle backup method of “incremental” and “full”

davidh1968 · July 13, 2020, 8:15am

thanks for the reply.

I’ll make note and do an experiment tomorrow. I’m backing up the whole lot after deleting the client node to start afresh.

I’ll then do a second full and check to see if it transfers a second copy of the file. It did look like it as doing this when I looked at the live log last time, and see whather it stores a second copy of the file un the new directory on the storage volume

David

orogor · July 13, 2020, 8:19am

In case your file server
You can search the for doc virtual client, this allows you to configure different folders to backup with different frequencies and retention policy.

The retention is done via either the archive tab or max number of full/inc to keep.
Archived backup do not count toward the max xx to keep.
Archive is the classical way of doing this and keep max, easy to understand.
With keep min/max, you can account for backup size variation, without bothering too much.
Archive is more formal/contractual to explain to your boss/customer.

As said before full backup or incremental, do not influence too much the disk space usage. and because full are very long, you may set them to 90 days or so (30 days is actually short for urbackup full).
I would say the interest of full backup is that they double double check everything, they can prevent archive rot, fix corrupted files, some backup bugs could only be fixed by doing a full, so it’s still necessary to run them from time to time.

If you have a lot of clients, on a fresh server install, i would also recommend to wait a few day and start a full manually on some clients. So all the full don’t start at the same time on the next cycle.

Don_Wright · July 14, 2020, 6:35am

After you run the backup tests, please use your operating system’s “disk free space” command (df on Linux, volume Properties on Windows) to see how much of your 11Tb remains. Most “disk used” counts seriously overstate the actual space occupied, as UrBackup also creates a directory for each backup that links the files in the common storage area into a pseudo-full-backup which can be copied or shared by normal file operations. Yes, even incrementals get this point-in-time pseudo-backup with links to all of the files, whether already known or freshly added. (Look in your /urbackup_data directory under each client name.)

In some cases each file on that server being backed up might be counted many times - once for the common file storage area and again for each full or incremental pseudo-backup directory. For example, I have a 4TB storage drive that Windows counts as having 49.8TB used in 34,586,071 Files, 3,227,904 Folders. Counting that with Folder Properties took two hours.

Bearded_Blunder · July 14, 2020, 12:14pm

If you’re running under a server version of Windows, you may also be surprised by the effect enabling deduplication may have, even given the native deduplication UrBackup incorporates for file backups… given the the commonality of files within images… in my own installation I’m showing a deduplication rate (today) of 53%, which cuts way down on the storage requirements for system images, assuming server deduplication is available to you, it can make astounding savings where multiple similar Windows workstation images are involved.

davidh1968 · July 14, 2020, 11:10pm

So, did some experimenting…

I should have not have doubted the developers…

Running a second Full backup and despite the file data being transferred to the server which I now realise is to it can be hashed and comared for whether it’s a changed file - I see that the “df” used space is not changing by much at all and I see the inode count has incremeneted to 2 obviously a hard link referring to original file…

davidh1968 · July 14, 2020, 11:37pm

orogor,

… you have stirred my thinking, but I just can;t get my head around it yet…

My content/data can be summarised as

Media files
Software install files
Proxmox image backups
Archived content

The large portion of my data is my media files. So as you can imagine, largely a static volume, but thinks added over time incrementally. Occasionally I grow bored of a TV series or set of movies and then delete them off the media server ( which uses NFS to access the content from my Centos file server…

So the behaviour I really am seeking for the media files, and to be honest simliar for the other types is;

Media files are backed up and retained as long as they are active/needed
new files are dealt with via incremental backups
Each month, or few months Fulls replace Incremental backups
Deleted files from the file server are retained for say 14 days ( just in case the deletion was accidental or needs to be reversed.
Re 4. above - a manual purge to reclaim disk apace on the backup server when required.

I’m also not sure how I can do the above differently based on the classification of content type, which is actually seperated via different filesystems on the file server.

Thoughts?

David

amazing · July 15, 2020, 1:53am

In the absence of archive settings (see Archiving) the oldest copy of a file in any backup will be number of backups times backup interval. So, for example, if you do a full file backup monthly with a maximum of 2 backups your oldest backup will be between one and two months old. Likewise if your incremental backups are daily with a maximum of 10 the oldest wil be between 9 and 10 days old. The combination will be that the oldest could be as young as one month ago and would be two months when the next full backup is due.

Backups are deleted automatically each cleanup period, normally nightly, according to the minimum and maximum rules for the backup type. There is no need to do it manually.

If you use Archiving settings you can cause retention of backups independent of the cycle described above.

If you occasionally delete large files the backup will not recover space until the last backup that included the deleted files is also deleted.

If you really find that you need differing policies for different folders look at Virtual sub client names but you may find that the de-duplication function removes the need. The large files will only be held once across all backups, both full and incremental.

orogor · July 15, 2020, 7:37am

Not sure what you need, maybe an example of home usage with archived photos/videos and /home.

Archived photos Is typically a lot of small files, so you don’t want to backup that too often, backup every week, Min 4, max 20, in case there’s not enough space, it would be reduced to 1 month of data, to avoid that, archive every month for 4 month. worst case is now you have 4 weekly backup + 4 monthly backup.

/Home needs more frequent updates, so backup every day, min 14, max 30, archive every week for 1 month , every month for 4 month.

Video files aren’t that important and space consuming, so backup every day, keep min 2 , max 20. If there’s a need for a lot of space , it will delete almost all the videos but the most 2 recent one.

incremental or full, you don t care too much, set full every 90 days, and keep min 1 incremental, max 10

davidh1968 · July 16, 2020, 2:56am

hi again orogor,

So I did a second full last night and despite it looking like it was going all ok, it dis end up just filling the drive and failing when it ran out of space…

So I’m really at a loss as to what to specify as the rules for backup/archive.

Essentially all I want to do is be able to restore a complete directory on my file server in case of disk failure etc. This is essentially /Media and it around 7Tb

So I thought that all I needed so specify was a full backup to set the baseline, then a series of incrementals looking for and backing up new on the file.

I’m concerned about specifying a backup interval for “Full File Backups”, it seems that when that comes around, it backups all the files and just blows the disk space away on the urBackup server… just as it happended last night when I manually ran a second Full file Backup…

Totally lost…

orogor · July 16, 2020, 9:46am

I think i noticed something a bit like this. That a full need some additional temp space to be processed.

I am not sure you need the full size as temp space, maybe it’s more because urbackup download all the files, and process them in parallel. In case of a full, there’s more files in the temp space waiting to be processed (number on the right in activities page, can be high if the db isn t on ssd).

Workaround could be to split the 7TB in maybe 4 different backups using virtual clients, so you don’t overflow your 11TB of storage with temp space.

Could also ask @uroni if my theory is right or if it’s possible to throttle the backups if more than a certain size or a certain number of files is waiting to be processed (i can get >10M on some servers/clients)

davidh1968 · July 16, 2020, 10:23pm

Thanks orogor, I might see what theychip in directly on the topic.

For now I’ve gone back to basics using cron, rsync and password less ssh on Linux which is ok.

eg.

running at 00:00 daily via cron:

 rsync -arvh -e  ssh  root@domain:/Files1/ /backup/domain/Files1/

and then to manage retention and deletion of files removed from the source I just:

 add the --delete flag

say each fortnight via cron or manually run.

But, I would be nice to get the behaviour easily managed with urBackup as I do like many aspects of it.

For instance, one thing the above can’t do well is saving different versions of the same file name at multiple intervals with each day which I could manage with an incremental schedule in urBackup. And, like my rsync method, the files on the “backup” server are accessible directly which I also like.

I’ll do some experiments with the virtual clients, maybe that’s the answer in managing the queues and processing on the server.

The above is a good solution for my linux vm’s/servers/workstations. I have only a couple of Windows 10 workstations with small data volumes so still have those managed via a urBackup server for now. It would be nice to have a single solution though…

David