Already running into BTRFS fragmentation issues? (after 2 months with 4 clients)

Dixxy · July 3, 2021, 6:27am

I’ve started using urbackup around 2 months ago, and am quite fond of its image backup implementation and great performance for incremental file backups.

Unfortunately I started noticing degraded performance, especially when doing raw copy-on-write image backups. Below you can find a screenshot of the “Activities” tab, were I’m running an incremental/full image backup for a new windows client without CBT. (One of the 3 Windows Clients actually has CBT tracking enabled)

When I started using urbackup I would fully saturate my 100MB/s connection to the NAS… Now, as you can see, performance is varying hugely and image backups take forever. I’m using a headless fedora 27 box, so I am not sure how I could best monitor what is going on on the disk (it is certainly being trashed from what I can hear). Using iotop -o is not very helpful, as output changes a lot; attached you can find the top reported activities by iotop (in this case, there is actually writes going on at 124MB/s, but oftentimes it’s in the KB/s range).
iotop.txt (2.4 KB)

This is a single 8TB DM-SMR Drive (External USB: Seagate ST8000DM004); the server itself is also a very low-power Intel NUC (NUC6CAYH) with 4gb of ram. I don’t expect incredible performance out of this, but the disk trashing seems a bit excessive.
I mounted the drive with the following parameters:
LABEL=8TB_BTRFS_BU /drives/backup2 btrfs defaults,compress-force=zstd 0 0
The system kernel is 4.18.19-100.fc27.x86_64 and uses btrfs-progs v4.17.1.

I wonder if these are already sings of fragmentation on the disk? What could be causing the trashing? Let me know if you need more information to answer my question(s).

P.S.: I realize there is a similar topic here: BTRFS very high iowait (even if stop urbackup service) Please help!

But my scope is much, MUCH smaller. As I said, I have 4 clients. 3 Windows, 1 Linux and I don’t see high io-wait.

I do infinite incremental image backups on all windows clients (one daily, one every three days and one only each week) and incremental file backups on Linux (every 2 hours).

Edit 2: Okay, nevermind. I do indeed run into high io-wait after a while:

Could it be an issue with the SMR drive’s cache filling up?

uroni · July 3, 2021, 10:07am

SMR disks are surprisingly slow. Obviously especially for random write IOs.

I’d use the latest FC/LTS kernel 5.10.y. Idk if they even backport btrfs fixes to 4.18…

Reducing btrfs random write IOs:

Increase commit. If you have a lot of free space maybe ssd_spread? ( options defaults,commit=600,noatime,compress-force=zstd:7).

Ideally btrfs metadata would be on a SSD. Btrfs can’t do that alone yet, but you could put bcache in front…

Maybe the disk can do trim, if yes run regular fstrim /backups. Or mount btrfs with trim enabled or use the newer btrfs background trim.

You can defragment free btrfs space by regularily running balance e.g. btrfs balance start -dusage=10,limit=1 /backups btrfs balance start -musage=10,limit=1 /backups (Adjust the 10 …) but if that helps depends on the SMR disk translation layer. In Debian there is the btrfs-heatmap tool which can visualize btrfs allocation/free space.

Most relevant for diagnosis would be the output of e.g. iostat -x 5 /dev/sdx. Writes per second combined with %util and write wait time should give you a good picture of how slow your disk is.

Dixxy · July 3, 2021, 10:28am

Thanks for all the Info!

Unfortunately there is no newer Kernel available for Fedora 27 and I’m not sure how I would go about compiling a newer one from mainline for Fedora 27. I also can’t really upgrade to a more recent Fedora, as this NAS runs “Amahi 11” which would break if I update the Fedora it is based on. I may consider to switch to the Debian-based OMV in the near future though.

Thanks for the hints with the mount options! I’ve now set defaults,noatime,space_cache=v2,compress-force=zstd (no level yet in 4.18) I did clear the cache beforehand by mounting with clear_cache. If this helps already I’ll keep commit at default, otherwhise I’ll try your suggestion with commit=600

I will see if the drive supports trim and whether balancing helps with performance, too. First of all, I’ll monitor the next large backup with iostat -x 5 and will report back with my findings.

Edit:
I waited for the HDD to quite down, i.e. stopping to shuffle the data around for the SMR writing and started another big backup task.
After around 25 GB backupped, the stats seem to balance out to an output like the following:

Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sdc1 25.20 149.00 12172.00 7557.60 30.00 21.20 54.35 12.46 214.18 514.70 68.44 483.02 50.72 5.74 100.00
iostat.txt (274 Bytes)

Performance seems to be generally better, but still not able to saturate my predefined upload limit (750mbit/s)

@uroni: Why would you recommend limit=1 in btrfs balance start -dusage=10,limit=1 /backups btrfs balance start -musage=10,limit=1 /backups ? Doesn’t that mean that only 1 chunk is changed, even if multiple chunks are not filled over 10%?

Edit again: I get it now, I think. That’s because you are suggesting to do this regularly, right? So if I did this every day at some point all chunks would be defragged anyways with the advantage of the individual balancing taking much less time.

Dixxy · July 10, 2021, 5:28am

Ok, I let the box run a couple of days with aforementioned settings. I didn’t do a balance yet, neither did I try changing the “commit” time (I feel this is kinda risky, if I understand it correctly).

Either way, my Windows Image changed by 30GB and it is backing this up since almost an hour… This is what iostat -x 5 shows:

Performance is abysmal indeed.

Is this really all down to the SMR nature of my drive, or is the oldish kernel responsible for this horrible performance? I really can’t imagine it would be any better, even if I delayed the “commit” as the process was slow to begin with. At this rate, the speed benefits of BTRFS are totally invalidated.

So I deleted all old Backups now, so I can do a proper free-space rebalance. Still several minutes after the last file transfer I get the following in iotop -o

TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
514 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [btrfs-transacti]
513 be/4 root 43.48 K/s 72.47 K/s 0.00 % 99.99 % [btrfs-cleaner]

and iostat shows near 100% utilization:

I guess I should not use SMR drives for the BTRFS/urbackup combination after all? This is obviously without rebalance running.

veehexx · July 17, 2021, 11:02am

while i wont experience this using SSD’s & PMR drives i too am on btrfs for both source and destination filesystems so this caught my eye.

Are you aware the latest versions have included zoned/smr support in btrfs? sometime over the last few months they released additions for these type of drives. Probably well worth considering upgrading your f27 to f33/34 as it will gain a fair bit of btrfs fixes and improvements.

Dixxy · July 17, 2021, 10:24pm

Thanks for the heads up! I was not really aware that there was any updates specifically for SMR drives. Would that also concern DM-SMR drives? I’ve got some advice from the BTRFS IRC-Channel, and am testing how the drive holds up now. (Essentially I changed compress-force flag to compress and delted all snapshots followed by a defrag)

For now it is looking good, but I wonder if the issue before was because urbackup kept deleting subvolumes because it assumed the disk was close to full (looking at the statistics, that might certainly be the case:

The sharp drop is when I deleted all subvolumes and did the defrag. See also the final upwards curve, which shows consumption after an incremental image backup of my laptop ssd (almost full 1TB SSD). This makes me worry, that at some point performance will come crashing down again, as every snapshot seems to count towards the total storage usage… is this by design? I thought it would actually report the same or something similar to what btrfs filesystem usage / compsize would report on the volume (i.e. the actual used data, not the potential used data, if all reflinks would be removed)

Either way, I’m just sticking to my running system atm, as I haven’t really found the time to set everything back up on a newer distro.

veehexx · July 18, 2021, 7:37am

good point on DM-SMR. i keep forgetting about DM & HM types.
The info i mentioned is for HM-SMR and theres even a note on the btrfs wiki that it does not cover DM-SMR: Zoned - btrfs Wiki