ZFS File Backup COW

h3po · February 22, 2018, 10:21am

Please help me understand the addition to section 11.7.1 of the administration manual:

Similarly, UrBackup supports copy-on-write file backups with ZFS. The methology is the same as the one for btrfs in the following section with the ceveat that identical files cannot be reflinked between ZFS datasets like in btrfs as ZFS is missing the reflink feature. Instead files will be copied, that is, UrBackup will not load a file twice if it already has a copy, but may store it twice if ZFS deduplication is not enabled

Why would the server have to duplicate existing files when the subvolume it is writing to is a COW copy of the previous backup? I think this caveat only applies to full file backups?
As I understand it, that’s not a problem since we can now do “incremental forever” style file backups like with images.

orogor · February 22, 2018, 1:10pm

Hi

As i remember only for file backup, it s less efficient because the zfs willl lack the “reflink” capability.

Reflink is copy with reference and behave kind of like a write enabled snapshot , but at the file level instead of the partition level (hard to explain, you d need to google it).

h3po · February 22, 2018, 2:53pm

But why would we need reflinks to unchanged files when zfs keeps references to the unchanged blocks that belong to these files?

orogor · February 22, 2018, 7:38pm

Hi

I hope this isn’t too harsh of an answer and @uroni would need to correct me where i am wrong, but basicaly :

At this point, urbackup will “in the general case” optimize disk space up to 1/10th of the disk usage (symlink of folders with unchanged content, refreshed every 10 backups ), and hardlink of existing files. Thus it s quite optimized already.

Because he’s the lone dev of the project, better wait for reflink to get supported for zfs than to develop a new way to optimize storage. If he just wait for zfs to support reflink, he can just re-use the code he used for btrfs.

Users who absolutely want super optimized storage can use zfs block level dedup which would basically be better than anything else at the cost of performance. (It wouldn’t work on smaller devices, peoples use urbackup on nas and raspberry pi).

Also he accepts patch and whatnot, so if someone can step up and develop that particular way to optimize storage, i think it’ll find a way in the public code.

uroni · February 22, 2018, 11:20pm

See for example here for the ZFS reflink status: https://github.com/zfsonlinux/zfs/issues/405
But reflink isn’t enough. It must be a cross-subvolume (or dataset in ZFS) reflink. Otherwise it cannot link identical files from one client into backups of another client.

h3po · February 23, 2018, 9:04am

Aha so that answers my question. The reflinking takes place when multiple clients share an identical file, and that can’t be done when each backup is in its own zfs subvolume so instead the file is copied.
To me the documentation sounded like the copying is needed for identical files between one and the next backup of the same client.
Anyhow I’m happy that we now have subvolumes for file backups, as that will make exporting them to an external disk easier (and safer, as my current solution is susceptible to changes in .directory_pool during the export).

@orogor I’m not whining about or asking for features, I’m asking about how it currently works. No need to defend uroni, harshly or otherwise.

Jeeves · March 1, 2018, 5:20pm

I’m very happy with the new feature, which is why we sponsored it.

There is indeed a small price to pay because inter-client-sharing of files is no longer possible. But being able to zfs send & receive all data is making me very happy. I also think that snapshotting is easier on the filesystem than all the links.

Thanks @uroni

orogor · March 2, 2018, 7:45am

@Jeeves

What s the new feature about ?
on the client , Its taking a snapshot, making a file backup , releasing the snapshot?
Or is there more to it , like the snapshot is transferred via zsend to the server which will also store them in zfs ?

h3po · March 2, 2018, 1:59pm

The new feature is that the server can use a btrfs or zfs subvolume to create a COW copy of the last backup and write an incremental on top. So it doesn’t need to hardlink or symlink files to the old backup that haven’t changed, which in theory should be much faster than linking.
Having the backups in subvolumes without symlinked directories also means it is much easier to export a single backup via btrfs-send or zfs-send.