Incremental Database (PostgreSQL, MariaDB) Backups Using Linux Btrfs Snapshots Seem to Take Up Too Much Disk Space

thomas · April 18, 2017, 3:06am

I’m running UrBackup Server 2.1.19 on Ubuntu 16.04 x64 (on btrfs volumes) with UrBackup Clients 2.1.15 on Ubuntu 16.04 x64 (on btrfs volumes).

One of my backup clients is running a PostgreSQL database (on a Zabbix server) and another one of my backup clients is running a MariaDB database (on a Nextcloud server). I am using the “Backup using snapshot” methods (listed at https://www.urbackup.org/backup_postgresql.html and https://www.urbackup.org/backup_mysql.html) to back up these databases.

I had understood that any incremental backups of these databases would only include the changed blocks during the window between backups, but it seems that any files that have been modified are transferred in their entirety to the backup server. As a result, an incremental backup performed a few minutes after another backup will be virtually the same size as the previous backup, despite very little data being written to the database files.

Have I misunderstood the way theses database backups are supposed to work? Does Changed Block Tracking not work on Linux with btrfs volumes?

If this is expected behaviour, is there a different configuration I could use, instead?

My backup plan is to configure a virtual client to perform the database backups on a less frequent schedule than my other files, so that I don’t eat up disk space unnecessarily. Would you recommend a different plan?

Thomas

orogor · April 19, 2017, 8:53am

hi

Urbackup works at the file level for dedup (ie identical files).
Image backup on windows on vmware can use cbt; so it’s kinda specific and it s image backup (whole disk)

Disk snapshot allow to to take consistant backup, where the last modification is the same everywhere when doing backup (need btrfs zfs lvm), so it s another thing.
Btrfs has some advantage as a data store in urbackup, thats a different thing.
zfs as a datastore can do block level dedup, thats a different thing (it may help to do waht you want)

You could dream of some better integration in case urbackup use zfs for storage and you wan to backup a server that use zfs, or the same for btrfs, but thats not the case.

What i do, is backup the databases locally, on some compressed format, then let urbackup gather the file.
For large db the file is copied to a datastore with a timestamp, then scripts expire older databases.

You could try something with wal files, as theses record only changes to the db and thus allow for some kind of incremental , but it s some work to setup, and a lof of reading needed.
Maybe paritionned database would help, if old data is never modified, postgres make one file per table, so only new tables would be backuped, again it s not a simple task.
Good database backup is never trivial, whatver backup software you use.

Thomas
[/quote]

uroni · April 19, 2017, 12:59pm

It should only store the differences. You’ll have to temporarily enable btrfs quotas to check this.

If you are using it in the local network you’ll have to enable the “block differences - hashes” transfer mode in the advanced settings to make it only transfer the differences during incremental backups. I’ll add this to the website and/or think about changing the default.

The Linux client does not have a CBT method yet (this is something the operating system needs to do), so it will read all the data to find the differences in any case. Even btrfs doesn’t have a build-in CBT method if you do not keep a snapshot of the last backup around.

orogor · April 19, 2017, 2:00pm

Why/how would it only store the différence?
Because he use btrfs, and it has dedup or reflink enabled ?

If it s reflink, internally how does it works? You copy with reflink the existing file in urbackup store, then the server replay the changes by doing the block difference thing. But if you didnt transfered in this mode, you dont get block level dedup ?

uroni · April 19, 2017, 2:32pm

Yes, because he uses btrfs on the server. And it also works if the whole file is transferred (it says copy with reflink in the debug log).

thomas · April 20, 2017, 12:09am

Thanks so much for your replies.

For the advanced settings, I’ve got “Temporary files as [file; image] backup buffer:” both unchecked, the next 6 drop down menus set to “Hashed”, and “[Local; Internet] incremental image style:” both set to “Based on last image backup”.

uroni - I believe this is consistent with the “block differences - hashes” transfer mode you mention. Is that right?

FYI - I am backing up my local clients using internet mode to take advantage of the encryption.

I think it’s working correctly, but would appreciate a sanity check. When I perform an incremental backup, the web interface says that it took up ~300MB of disk space and that’s what shows up under the “Statistics” tab as well. However, the debug log shows a “HT: Copying with reflink data from…” line under each and every “HT: Copying file:…” line, and I’ve noticed that the original full backup only contains “HT: Copying file:…” lines. I believe this means that it isn’t treating any of the “new” database files as actually new in the incremental backups.

So far, so good - I think.

Instead of diving into btrfs quotas (which seemed a bit complicated), I used btrfs filesystem usage stats to support my conclusions. Please let me know if I’ve erred somewhere.

Prior to an incremental backup:
root@urbackup:/# btrfs fi usage -m -T /srv
Overall:
Device size: 1572863.00MiB
Device allocated: 285720.00MiB
Device unallocated: 1287143.00MiB
Device missing: 0.00MiB
Used: 248188.06MiB
Free (estimated): 1321321.12MiB (min: 677749.56MiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00MiB)

Data Metadata System
Id Path single DUP DUP Unallocated

1 /dev/vdb1 278536.00MiB 7168.00MiB 16.00MiB 1287143.00MiB

Total 278536.00MiB 3584.00MiB 8.00MiB 1287143.00MiB
Used 244357.91MiB 1915.02MiB 0.06MiB

Then, after that incremental backup was performed:
root@urbackup:/# btrfs fi usage -m -T /srv
Overall:
Device size: 1572863.00MiB
Device allocated: 285720.00MiB
Device unallocated: 1287143.00MiB
Device missing: 0.00MiB
Used: 248194.56MiB
Free (estimated): 1321316.62MiB (min: 677745.12MiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00MiB)

Data Metadata System
Id Path single DUP DUP Unallocated

1 /dev/vdb1 278536.00MiB 7168.00MiB 16.00MiB 1287143.00MiB

Total 278536.00MiB 3584.00MiB 8.00MiB 1287143.00MiB
Used 244362.34MiB 1916.05MiB 0.06MiB

Am I right to deduce that the filesystem only shows an additional ~6MB of disk space used? If so, that’s obviously way smaller than the ~300MB that the UrBackup web interface claims. I know btrfs filesystem usage is pretty complicated, so I may be misunderstanding the presented stats.

If this is all correct, then it’s easy enough for me to avoid relying on the disk space stats in the UrBackup web interface for these clients and just be happy that they are backing up properly. Maybe I can make a feature request to tweak the way the stats are shown in the web interface to accurately reflect this scenario, but I imagine it may not be trivial?

Please advise.

Dagon · September 21, 2017, 1:26pm

@uroni I have a similar “issue” than @thomas. I’ve just migrated to btrfs and it works perfectly fine: Outlook offline .pst files get transferred over and over again (thanks to the stupid timestamp mechanism of Outlook), but “ignored” by btrfs in the end as the actual hash hasn’t changed. This can be verified by looking at the disk usage. So far so good.

BUT, the web interface keeps reporting that the incremental backups are all taking ~2 Gb of space. This doesn’t reflect the reality on disk. Is there something fundamental that prevents the web interface from working properly or is this something that just hasn’t been developed/changed when btrfs is used underneath? I can see that Linux “du” gives the “incorrect” value while “df” reports the correct amount of used (free) space.