BTRFS Slow image restore (~10MB/s-20MB/s) and slow image backup

clang · September 18, 2023, 12:28pm

I’ve been troubleshooting my slow image backups as of late. I’ve determined that the main cause for the slow performance is the HDD I am using for backups: A “Western Digital Technologies, Inc. Elements Desktop”-USB3.0 Enclosure containing a 10TB WDC WD101EMAZ-11G7DA0.

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD101EMAZ-11G7DA0
Serial Number:    VCGRK2RP
LU WWN Device Id: 5 000cca 0b0ca404f
Firmware Version: 81.00A81
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Sep 18 14:18:51 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

I am running UrBackup 2.5.31 on a debian host via docker. The drive is formatted in BTRFS and I’m making heavy use of the BTRFS features in UrBackup. The drive is capable of around 160MB/s of throughput, if I transfer a single large file via rsync, however I did see some fluctuation in the speeds the drive delivered… Perhaps something the USB-SATA controller screws up?

Anyways, I managed to improve backup performance by delete all old snapshots and defragging the client volume on the backup drive (btrfs filesystem defrag -r /backups/client), this allowed the backup to finish with a speed of around 400mbit/s. However, while trying to restore the backup to an SSD drive right now, the speed hovers around 10-20MB/s, so the restore of 170GB of data would take around 5h.

Is there any way I can improve the speed of this process? It seems like the limiting factor is again the drive:

iostat -xdch -p /dev/sdf1

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.9%    0.1%   14.8%   16.4%    0.0%   67.8%

     r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz Device
  108.50     27.2M     2.00   1.8%   13.48   256.6k sdf1

     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdf1

     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz Device
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdf1

     f/s f_await  aqu-sz  %util Device
    0.00    0.00    1.46  82.6% sdf1

This is how the drive is mounted: (rw,noatime,compress=zstd:3,space_cache=v2,subvolid=5,subvol=/)

anon16079863 · September 18, 2023, 10:16pm

Not sure why you are writing this in a urbackup forum.

Defrag is more or less never necessary. I would rather recommend against using it. (btrfs handles scrup, balance and defrag automatically in the background)
Make sure the sata controller is of quality, not some “random device I bought on amazon delivered from China”. I use ecybox cases since I know the chipsets in them is of good quality.
I actually got faster speeds using higher compression level on my WD red drives, but I think that could all differ on your hd, sata controller quality, computer hardware etc etc. It also def has nothing to do with the slow speeds you have, but could be something to play around with later.

I suspect you have a “not good quality sata controller”, I have multiple times read about the phenomena you are having in combination with “bad hardware” (and its not broken, it is just low quality firmware/hardware). I made a quck search but cant find it now, but I recall reading a list with supported hw for linux somewhere and then checking the chips inside the cases I was choosing and then making the choice of ecybox…
I have heard ppl call it “overheating”, or “cache overfilling”. In 90% of the cases it is because of bad hardware.

BatterPudding · September 21, 2023, 3:48pm

Trouble with China is they like saving on costs, which means kill the quality.

Have a thread full of suggestions and reviews

I use those PI Hut cables mentioned down the thread: SSD to USB 3.0 Cable for Raspberry Pi | The Pi Hut

Similar issue can be pointed at USB controllers - is it really running 3.0?

clang · September 21, 2023, 9:37pm

(btrfs handles scrup, balance and defrag automatically in the background)

I’m not sure about that. I believe you have to setup scrub and balance manually unless you’re distro/nas software has an automated scrub logic in place (like OMV has for example). Defrag can be automated if you mount your drive with the flag autodefrag, but I don’t and balance is definitely something that needs to be run manually or with an automated script. In fact, last I checked, if you removed a disk from a RAID1-array in BTRFS and write something to the incomplete disk, the data will ONLY be on the single disk that was in the array, even after you reconnect the second disk and the array returns to being complete. The data is only “balanced” to the second disk once you run balance. In essence there is no “automatic resilvering” with BTRFS as far as I know.

That said, you are right, this goes a bit off-topic. The reason why I posted this here, is because I wanted to hear some opinions on the reasons for the poorish performance. Thanks for both of your feedbacks!

I am building an new nas which has direct SATA connections, so I’ll see if performance on that system would be better.

anon16079863 · September 22, 2023, 4:03am

IIRC when reading about this last, if you have the latest btrfs installed for your kernel, scrubbing and balancing should be done VERY carefully. btrfs should handle these things automatically without interfering on your workflow.
On a spinning disk it mostly just takes a lot of time, but on an ssd, you should be VERY careful with them, they significantly shorten the lifespan of an ssd since they rewrite lots of data, and as we all know, there is a limit on rewriting on ssd:s, and it is VERY easy to by mistake rewrite every-single-bit-of-data on a drive.
I sometimes balance the disk, like once a year because I’m bored and have nothing to do, but be CAREFUL!!!
I do something like sudo btrfs balance start -musage=50 -dusage=50 / & sudo watch -n1 'btrfs balance status /' and when done increase by to 70, and after that 90, but again, BE CAREFUL! This process can take a very long time.

As for scrubbing, I actually never do it, since scrubbing wont repair anything anyway, btrfs check might be able to repair. But it’s not a bad Idea now that I think of it to start running it once every second week or so. This post made me look into it again, things might have changed.
sudo btrfs scrub start /
sudo btrfs scrub status / (or use scrub start with -B option to keep it in the foreground)

If we are talking about btrfs raid, yeah, that is a little different, but to be honest, do not use btrfs raid, THAT is pretty darn unstable tbh and you should NOT rely on the raid 1 to keep you safe.
Probably more likely the opposite where you THINK you are protected but when disaster strikes, the raid is not working and you realize this first when you try to mount it and it refuses to mount other than read only.

MrBates · September 23, 2023, 1:03am

It actually makes sense. Drives perform best on sequential R/W, but the speed drops significantly on random access, which is bound to happen with urbackup and btrfs’ copy-on-write feature as time passes. Defrag helps as you noticed, but only temporarily as new blocks are written out of sequence.

Regarding btrfs RAID1, it is considered safe nowadays (according to official publications).

anon16079863 · September 23, 2023, 9:24am

I would recommend AGAINST defrag if you have subvolumes stored on the disk OR if you have any vm or containers running on any subvolume.
Ie, if you run urbackup in btrfs, every-single-backup is a subvolume, and I have been told multiple times from multiple sources to NOT defrag that.

I have been running btrfs on my backup drive for a LONG time and NEVER defraged. Have never seen any drop in performance.
With bad hardware, yeah, I have seen what is described A LOT before buying proper harware.
And on ssd:s DO NOT DEFRAG OR BALANCE (unless you are FORCED to for some reason)!!! You shorten the lifespan of the drive significantly.

About raid1, you risk it if you want to, but looking through posts about it on the internet I find a lot of stories where “my volume is only mounting in ro now, what do I do” and finds out the data is now corrupted beyond repair.
It seems to me more of a rule that the data becomes corrupted if one of the drives fail.
The risk is yours.

ME on the other hand see zero advantage with raid 1. I make selected bakups-of-backups with btrfs send and store them on yet another drive as “worst case scenario backups”.

if “mostly ok” ferformance is enough for your backup filesystem, then use raid. I would not risk it.
Note that this is with linux 6.5 kernel. Older kernels has older btrfs baked in them.
https://btrfs.readthedocs.io/en/latest/Status.html