ZFS Deduplication Questions

cmunroe · March 26, 2019, 6:27pm

I’m working on making a new urbackup server. We are using a mixture of Image based backups, and file based backups as of current against BTRFS. However, we want to move to image based backups for everything. The problem of course becomes space.

So, my first question is can ZFS deduplicate two identical files in two different machine images? Do I need special settings to make this happen, and what are they?

Secondly, is there any statistics on the space savings between ZFS and BTRFS for image and file based backups?

Finally, any suggestions or warnings in the utilization of ZFS with urbackup?

Thank you for your time.

orogor · March 26, 2019, 9:00pm

Hello

You need to read up on zfs deduplication. It consume a lot of ram, and that’s mandatory, so there s a ratio of memory to ram to provide, if you lack ram , you can t use the fs.
If you use special mode btrfs, the deletion operation should be faster.
I don’t use image backup too much, and for file backup , i didn’t get any noticeable différence
Btrfs has offline deduplication that you need to schedule , on zfs it happens in realtime.

cmunroe · March 28, 2019, 3:45pm

Correct, I already know of needing anywhere between 2 and 4 GB of RAM for every TB of storage. Memory isn’t the problem here, and neither is the specs. The problem here is whether or not ZFS is worth the extra cost in regards to image based backups.

uroni · March 28, 2019, 3:50pm

Obviously disable compression (vhdz). Maybe you’d get better results with raw image backups on ZFS? The problem may be that ZFS does block level deduplication and the files on NTFS may be on slightly different block boundaries. So maybe experiment with the ZFS block size…

cmunroe · March 28, 2019, 4:00pm

Has anyone ever run tests to see how much ZFS will deduplicate out of image based backups in COW/VHD images? Do we know if it will even?

From my understanding BTRFS only does deduplication in File backups, and in the incrementals of the same image backup. Isn’t that correct?

tmo7452 · March 30, 2019, 12:24am

urBackup works with BTRFS to handle this at very low level. Each backup (both image and file) is stored in a separate subvolume. When a new incremental backup starts, it first asks BTRFS to clone the subvolume (with is immediate and takes no space), then it writes all the changes to the new one. Since the entire subvolume is COW, only the changes consume space on the disk. Since the COW aspect is block-level and not file-level, this applies to RAW image backups as well. In this setup, you would want to completely disable full backups for both files and images. urBackup will automatically create one anyway if no previous backup exists.

I have no experience with ZFS, but according to the urBackup documentation it works the same way.

cmunroe · April 3, 2019, 5:41pm

I setup a test server, and pointed two similar machines at the urbackup server. Both machines are running 1809 windows, and have similar software installed. I was hoping that many similar chunks like windows would be deduplicated out with urbackup and ZFS. However, even with RAW COW images this doesn’t seem to be the case.

Here was my results, and they just don’t seem worth the cost:

NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
backup   896G   183G   713G         -     4%    20%  1.09x  ONLINE  -

uroni · April 3, 2019, 5:58pm

As said, you’ll probably get better results if you set ZFS max block size equal to the ntfs cluster size (usually 4096 bytes). That makes the ZFS dedup use a lot more RAM, though.

Even if you get good results, it is probably not worth it. The reason is that (at least with hard disks) IOPS don’t scale with size. And nowadays you have 10TB hard disks with max 200 IOPS. Even the normal cow backup method often causes too much random IO.