BTRFS Raw copy-on-write image backup is slow

Hi everyone.

I’m currently trying to backup my Windows machine to my urbackup server, that’s using raw copy on write to a BTRFS partition, on an external disk encrypted using LUKS. It is extremely slow. It has been running for about 8 hours and has completed about 65GB out of 550GB.

The backup disk is a WD my passport 1TB: https://www.amazon.co.uk/Western-Digital-Passport-Portable-Type-C/dp/B0792DP87N/ref=psdc_430544031_t3_B01LQQH83I?th=1
The disk is connected to the USB3 slot.

The urbackup server is version 2.4.13, running in docker. The host OS is ubuntu server 20.04, running kernel 5.4. The entire linux OS, urbackupserver included, is on an SSD.

The windows client and server are connected to the same LAN using ethernet.

I have been using urbackup for awhile and have noticed be a little slow (I have 4 clients in total backing up to this disk, there’s ~300 sub volumes on the disk from backing up).
I only recently learnt about using raw COW for image backups so I nuked my previous windows image backups and am running the full raw COW image backup for the first time. It is this backup that is very slow.

Disk is mounted using defaults,nossd,noatime,compress-force=zstd:4,space_cache=v2,commit=300

iowait seems high, varying from 30% up t0 80% (iostat image included).

There are extended periods where the write speed seems to be almost 0 (see screenshot of activities page)

Things I’ve tried.

  • Changing scheduler to BFQ
  • Dropping autodefrag from the mount option
  • Lowering dirty ratio and background dirty ratio
  • Max clients backing up set to 1

Any help is greatly appreciated!

To me it looks like the issue with the backup drive. It can process 6 requests per second with 6kb/s read and 38 kb/s write, which is super low. I assume that sde is your backup drive though.

White WD Passport on 1TB doesn’t look like containing SMR drive, I would still do a check for:

  1. Model of the drive and check against List of known SMR drives | TrueNAS Community
  2. Run a sustained write test for at least 100Gb (some drives has CMR area for around 60Gbs, then speed dramatically drops). Like linear file write. Either copy a 100 Gb file or if you have a powerfull CPU, then something like dd if=/dev/random of=/mnt/sde/test_file bs=1M count=100000
  3. Do a surface test for slow blocks. Not sure about analogs for Linux, but for Windows there is a Victoria. You’re interested in read speeds for the whole drive.
  4. If all of that passes ok, try to look deeper into activities using tool called iotop - it will show detailed information on io to the disk.

If the drive proves to be SMR - you won’t get any toleratable write speeds with COW.

Yeah it seems crazy low to me too.

For the SMR check: looking at SMART, the ID for this device is WD20SDZW and I don’t see that on the website you linked.
I’ve attached a pick of the SMART data from my monitoring suite to show the ID, as well as the SMART data showing that the device is not on its last legs.

The sustained write test: I’ve got a backup running and I don’t really want to cancel it yet. But, doesn’t this backup count as a sustained write test?

For the surface test: I think SMART monitors this? Isn’t this the ‘offline uncorrectable’ smart value? I’ve issued an offline smartctl test to check, anyway.

FWIW, I have been using IOTOP, screen attached.

I don’t think the speed has always been like this; I don’t think it’s been great but it definitely was not this bad earlier on.

I’m also attaching another iostat pic, as it varies a lot.

There are few reasons for doing write tests:

  1. Drive manufacturers are not marking their devices as SMR, thus detection of those drives is community driven. Absense on the site is a good sign, but not 100% gurantee that drive is not SMR. I also assume that such drives are rarely in forcus of NAS community as not used in NAS servers typically. So might never appear in the list.
  2. Clean SMR drive behaves the same way as CMR, until almost filled up and data start to be rewritten.
  3. SMR drives contain CMR areas on platters, which may hold up to 60 Gbs data written at full speed. Consider it as write cache.
  4. You need to find area to dig to. You have multiple now: urbackup, btrfs, hdd, link between hdd and system, OS. Any of those might be the cause. Try to remove them one by one from equation.
  5. To see if your system is using USB3 speeds for the drive.

From what I see on the lastest screenshot:

  1. very typical picture of SMR drive: super slow operations when you have disk writes. And you have them on level of 300 Kb/s.
  2. IO is hogged by BTRFS kernel processes. I would say that Urbackup has nothing to do with this.

My suggestion is to abort backups, as you might need to destroy FS during your tests. And 300 Kb/s would take enternity anyway.

As for SMART - smart is not capturing “slow blocks”. So for example if it takes to read part of sectors, let’s say 5 seconds - unless HDD firmware will report error (for example for NAS drives those timeouts are less), SMART will not show any errors, while your disk performance will be disgusting. Slow sectors might easily be the reason for performance you’re seeing.

My suggestion is to abort backups, as you might need to destroy FS during your tests. And 300 Kb/s would take enternity anyway.

The backup has been running for about 17 hours and it’s backed up 140GB. By my reckoning, that’s a 2.2MB/s write speed? Granted its terrible, but it isn’t 600kB/s. I’m looking at a couple of days for this backup to complete, I think.
I’d really rather not Destroy the FS if I can help it as its already got a lot of backups on it. I guess I could completely capture the partition to an internal disk maybe? But, then I’d be reading ~1.5TB from this disk, and the read seems to be slow…

There are few reasons for doing write tests:

Yeah I get the need for such a test, but I figured that the ongoing image backup was a write test, but I do take your point that it a separate test eliminates a cause.

As for SMART - smart is not capturing “slow blocks”. So for example if it takes to read part of sectors, let’s say 5 seconds - unless HDD firmware will report error (for example for NAS drives those timeouts are less), SMART will not show any errors, while your disk performance will be disgusting. Slow sectors might easily be the reason for performance you’re seeing.

I was taking the info from this stack exchange, where someone suggested running an offline smartctl test to find bad sectors. Are you talking about something else?

According to this list, your drive is DM-SMR: What WD and HGST hard drives are SMR?

And this is correct, however as mentioned earlier “slow sectors” are not considered by firmware as bad sectors. Having surfrace test with tools like Victoria will let you know speed of response for each sector on the drive.

This suggestion is not relevant though, due to fact, that your drive seems to be SMR.

Ah, the --full-- ID is WD20SDZW-11JJ8S0 and the second part of the ID doesn’t match anything on that list.

So, maybe it’s still not SMR?

And this is correct, however as mentioned earlier “slow sectors” are not considered by firmware as bad sectors. Having surfrace test with tools like Victoria will let you know speed of response for each sector on the drive.

OK, right. I understand what you mean. So far, I’ve not seen any linux equivalent for the windows tool you mentioned. Do you know of any tool?

I wouldn’t hope for that.

Model Number Suffix
The model number suffix (characters to the right of the dash following the model number) is only for in-house use.

Reference: https://zedt.eu/storage/2012//2579-001028.pdf

Unfortunately I haven’t seen anything similar for Linux.

The model number PDF is missing Z for 5 and W for 6, so I cannot even see what my disk is.
Anywho, my point was that the website you link also specifically mentions the characters to the right of the dash.

So, assuming this drive IS SMR, then am I stuck with this performance? Is there nothing to be done about it?

Are there any other tests I can do to check if it is indeed SMR?

Email WD support & ask them. 100% definitive answer.

Thanks, I have done exactly that and I’m waiting for them to get back to me.

In the meantime, it looks like hdparm reports that the HDD in question has TRIM support:

$ sudo hdparm -I /dev/sde | grep -i trim
           *    Data Set Management TRIM supported (limit 10 blocks)
           *    Deterministic read data after TRIM

I also asked on the BTRFS IRC and the helpful people there concurred with you that the drive is almost certainly SMR.

Obviously I will still wait for WD support but it’s looking very likely that the my passport drive is SMR…

Heyho…
BTRFS IRC folks suggested that running trim (discard=async [I’ll need to update the kernel]) --might-- help a bit.
Does anyone here have experience with that in relation to urbackup?

Just to update.
I’m still waiting to hear from WD, but the device having TRIM support was fairly telling IMO, given that WD has a page saying SMR HDDs do have TRIM support: TRIM Command Support for WD External Drives

The 500GB image completed after almost 3 days.

I’ve now updated my kernel so that I can enable discard=async, and I’ve run a trim on the disk

$ sudo fstrim --verbose /mnt/backupdrive
/mnt/backupdrive: 464.6 GiB (498821480448 bytes) trimmed

I’ve also left the HDD idle for awhile to let the SMR rearranging catch up.

I’m now getting decent write speeds again. Given that I’ll just be running incremental file/image backups, I think the drive is sufficient for what I need (fingers crossed).

Good shout on the disk being SMR @Dark_Angel ! And, thanks for all of the help!

PS: getting fstrim running was non-trivial as lsblk --discard claimed trim was not runnable, despite hdparm saying it was.
The TL;DR is that I had to make a udev rule: Enable Trim on an External SSD on Linux [Glump.net]
Maybe that will be useful for someone in the future.

PPS: the trim took about 15 minutes, it was nowhere near as snappy as an SSD trim.

The fact that disk supports TRIM - is good news and the fact that it actually works - is double great news. Congrats. Not all DM-SMR disks support trim and using them becomes nightmare as they are getting filled and rewritten.

Thanks for update.

Haha thanks,

So far, it ‘works’ in the sense that I can repeatedly run fstrim and it claims to do trimming AND I can still access the filesystem afterwards (hopefully meaning the trim wasn’t writing garbage to the disk).

I was a little worried that I had to create an udev rule to make fstrim run, but the resultant trim doesn’t seem to have broken anything…

Thanks for the help!

So I think I have a final update @Dark_Angel and @Bearded_Blunder

WD --cannot-- tell me if the drive is SMR, despite giving them the exact serial info. They also cannot provide me with any kind of technical assistance with the drive as I’m using linux (why are companies still like this?)

See support snippet below:

We would like to inform you that, unfortunately, we do not have information about which disk is used in this device, you can find out more detailed information on this matter by reading this article:
https://support-en.wd.com/app/answers/detail/a_id/13652/initiator/user.

If you removed the disc from this unit and would like to know what recording method it uses, please provide us with a photo of this device from both sides.

So, even WD does not know if the drive is SMR.

It made me lol that the page they linked to said that I’d void my warranty if I opened the enclosure, but their direct email asked me for photos of the exposed drive…

Given:

  • the performance issues
  • the drive supports trim
  • the performance has improved after running fstrim on the drive
  • I can hear the drive churning while the drive is idle (from an OS standpoint)
  • performance improves after sitting idle

we can probably say that the drive is SMR.

Now that I’m back to doing incremental backups (+idle time + trim) I’ve found the drive to be actually pretty good for my purposes.

So, I’m probably just going to leave the situation as it is now and continue with the drive mounted with discard=async and let fstrim run as a weekly service on the drive.

Thank you for all of the help @Dark_Angel !!!

Thanks for update.

For those who would find this thread, remember that all big manufacturers of drives are like WD these days. Seagate creates DM-SMR disks which do not support TRIM at all and hides information about which drives are which. WD just hides information, but at least support TRIM. You can also find rediculous cases: for example recent WD Red drives series, which are for NAS are also SMR: from 2Tb to 6Tb.

So be aware: check drive in the lists mentioned in this thread and do not get yourself into SMR, unless you aware about technology pros/cons and you know what you’re doing.