Question: What backup retention strategy do you use?

haertig · September 9, 2017, 5:05pm

I’m wondering what backup strategies people are using. Mostly related to backup retention.

I am backing up up four computers - two Linux, two Windows, one of those Windows is remote (but since it’s VPN connected 100% of the time, UrBackup sees it as a local). If have a 4Tb drive hooked up to a Raspberry Pi3 that runs the UrBackup server. The only significant thing on that 4Tb drive is the Pi’s OS (small) and the UrBackup server and data.

My UrBackup settings are (settings are the same for all four client computers):
Incremental file: every 24 hours, min 31, max 366
Full file: every 30 days, min 2, max 12
Incremental image: every 30 days, min 2, max 12 (Windows only)
Full image: every 90 days, min 2, max 5 (Windows only)
Soft filesystem quota: 95%
No archiving
No per-client limits

I just kind of pulled those above settings out of thin air, not really knowing what would be best.

After using the above UrBackup settings for four months, I have consumed just under 1Tb of my 4Tb drive, so right at 25% of the disk. My backup space usage is very linear, you could draw the UrBackup graph line with a straight ruler, except for the small bumps for the image backups. So projection of future space needs is pretty easy. I figure I’ll hit the “max 366 incremental (daily) file backups” limit before I hit the “95% filesystem quota” limit. So I should easily fit “one years worth of daily backups” onto this 4Tb drive.

What would people recommend for settings considering that I plan to just keep UrBackup running and running and running, probably for years? Less frequent backups? Smaller max settings? Start using archiving? Based on my usage and how much I’d be willing to lose in a disaster, I could back off to once-per-week file backups. But I wouldn’t want to go less frequent than that. I am most comfortable with daily backups. And I would prefer around 3 months of available backups (not necessarily daily) to give my family time to recognize that they might have accidentally deleted a file and need to recover it from backup. I’ve got this level of comfort with my current settings, but keeping 366 daily backups for each computer seems a tad excessive and possibly wasteful. What “better” settings would people recommend?

Since UrBackup uses hard links (as opposed to symlinks) to connect unchanged files in backup storage, I would think I could even run an external script to pick and choose individual backups to delete. Possibly keeping one month of dailys, then three months of weeklies, then a couple of years of monthlies. Just target the specific daily backups that I no long need to keep and delete them, the remaining hard links would be unaffected. This may be ill-advised though. Preferably, UrBackup should handle all file “roll ups” IMHO.

orogor · September 10, 2017, 1:47pm

Hello

Yes, use archiving, do not make the script yourself, urbackup would basically do the same thing with archiving and probably handle it better.
Lower the max incremental setting that you have, to the number of incremental that you have so you dont lose existing backups, or maybe use 90 as you said so.

Archives :
When using archive and using min max, having a value different at min and max allows a few things:
To store the unscheduled backup that you do manually
To allow for some backups to be cleaned when you run out of space, so that the new backups can be stored.
If backups are very fast (like 1 min)you can also try to backup more than once a day, that would account again min-max, then use archive every 1 day, to keep a single backup at the end of the day.

Archive backups do not account against min-max settings.
When archiving you can specify if you want to specifically archive a full or inc or that you dont care

Archiving/un-archiving :
Archives are tagged as archive when the backup is taken, then the retention can not be changed.
If you change the archive setting afterward, existing backup won t be tagged/untagged as archive. (if you switch from archive every day to archive every 7 days, the exising backups won t be untagged as archive.)

The reverse is also true, so you need to archive existing backups manually.
However you can not specify an expiration date when archiving manually (or maybe edit the db directly) so the manual archive are not automatically expired.
Archives can be manually unarchived.
If you unarchive, you can delete immediatly, but the UI will get stuck, so if you need to delete a lot, the most simple is to wait 24h, it will delete during the maintenance periode.

For images :
Peoples are actually not very interested in the images themselves, but maybe images can catch something that the file backup didn’t because of bad backup path or filter.
For stuff like photos, using the image from long time ago or reinstalling if that didn’t work. Then copying photos over should be acceptable.

In both case :
Full are a lot more expensive that the incremental, so you’re good to not do it too often

#=====================================
Really do your own computation but maybe

I will suppose that your incrementals run fast
This will allow you to go back every 6 hours for at least the last 2 days, plus keep quite a lot of manual backups.

If you currently have 120 incremental , then wait 3 month before changing the max incremental setting (3*3 month+30=120), else you’ll lose backups. Or manually archive some, but currently you can’t define and end date for archive.

I don’t have any specific suggestion for the images backup, the settings don’t seems specially good or bad.
If they are very long keep it like this.
If they are fast maybe backup more often and use archiving, to keep recent more often, and lower min-max.

#=====================================
Not changing too much what you have :
Incremental file: every 6 hours, min 8, max 30
Full file: every 30 days, min 2, max 4

Archive file:
Every 1 day for 60 day (this will get ride of the 4 daily backup to keep only one, maybe here specific incrémental file)
Every week for 25 weeks
Every month for 36 months (maybe specify full file here, instead of file)

Total you store 215 file backups :
3*30 (4 time day minus archive) +4 full
60+25+36 (archives)

haertig · September 10, 2017, 5:06pm

Thanks for the reply. I will look into implementing some of your suggestions.

FWIW, an incremental file backup from the Windows computers take between 10 and 30 minutes. Those incrementals are typically about 2.5Gb "data transferred’, with most of that, no doubt, being the huge email inboxs my family members maintain. Some day I may get around to switching them from mbox to maildir storage format to get away from slinging these large email files around at backup time.

From the truly local Linux computer, about 30 to 45 minutes for an incremental file backup. Also typically 2.5Gb data transferred. This one is mine (and I use mbox format also, but don’t have such a large inbox as other family members). The remote Linux box (VPN) is subject to fairly poor network conditions popping up on occasion, so it’s backup times are highly variable, but mostly fall in the 1 to 1-1/2 hour range. Typically about 2Gb of data transferred from this remote box during an incremental.

Full file backups for all computers typically take hours. A bit over 50Gb transferred for the biggest Windows box. I consider this performance I’m seeing to be “slow”, but I blame that mostly on the UrBackup server being hosted on a Raspberry Pi3 - a great little gadget, but not a speed demon like the more typical desktop computer. Plus, the Raspberry Pi is connected via WiFi and the disk it’s writing to is limited by the Pi’s meager USB 2.0 interface. I have not measured my Pi’s WiFi speeds, but others peoples benchmark tests that I’ve read about are in the 20Mb/s range. I could plug an ethernet cable into the Pi for better speeds, but my feeling is that it’s the CPU limits of the Pi and the slow USB 2.0 disk interface speed that’s the root cause of the slowness. I can live with this slowness though, since the backups are done when we’re all asleep and the computers are idle.

uroni · September 10, 2017, 5:50pm

The way I see it one has two choices wrt. to setting the backup retention strategy:

a) What a consultant would do

Take a look what is being backed up, how valuable it is, how often it get’s e.g. accidentally deleted, how long it is valuable etc. then construct a retention strategy to minimize data loss wrt. backup storage cost. A few excel sheets and non-linear equations plus some probability theory should do the trick.

E.g. you have some tax documents which rarely get accidentaly deleted which you need to keep for 10 years, but the probability that the tax authorities need to see that document again is really low, etc. … so you add an archival rule that archives one backup per month for 10 years.

b) Go for overkill and then adjust to acutal usage

This get’s easier and easier as storage cost decreases. Time is valuable and doing the calculations in a) costs a lot of time which one can invest in more backup storage. In your case it actually sounds like 4TB is actually overkill already
Set the max number of backups high, archive a few for a bit. Configure the mail settings (admin mail) such that UrBackup can notify you if it runs out of backup storage because of the archived backups. Then adjust the rentention strategy if problems pop up (e.g. UrBackup tells you it ran out of space or you have performance problems because of too many btrfs snapshots).

It actually uses a combination of sym and hardlinks per default if you do not use btrfs.

You can disable full file backups and run only incrementals if you want to.

orogor · September 10, 2017, 10:19pm

For stuff like mbox over the vpn (if that’s your issue), you can try hashed block transfert (it s the transfer mode for internet clients) and a virtual client .
In general if you have a file/folder which is widely different from the other. Or very large compared to them; i would suggest using virtual client. Because you can apply different policy to it.

In this case, you can backup the mbox less often, try to use hashed block transfers instead of hashed file.
If it doesn’t fit you, because it acts as a separate client, you can delete the corresponding backups if you dont like the results.
If you like what you get, you may filter the mbox file from the main client backup.

Hashed block transfer is like; instead of making a single hash for the whole file. It cut the file in little parts and hash them. Then only transfers the changed parts.
So for mbox which is mostly append, it would work well, and help a lot over an internet link.
But it s also more cpu expensive, so maybe it wouldn’t works well with a raspberry pi, also as uroni said, maybe sometime simpler is better.