[Server 2.0.x BTRFS] High server disk IO while doing incremental file/image backups

I’ve searched the threads on a similar topic but didn’t find anything similar to my case, so I’m starting a new thread and apologize if I’m creating a duplicate topic :wink:

I’ve noticed when a client is doing an incremental image backup, the server experiences (very?) high disk writing activity and IO even if the client is backup up over a relatively slow internet connection. I suspect the issue may have something to do with BTRFS(with compression) + CoW + synthetic backups… This causes the disk to become a bottleneck.

Configuration:
ESXi host with 8x5200rpm drives in raid-6 on a hw raid controller & write cache enabled
Server 2.0.13 on Debian 8.3
BTRFS drive mounted as /data btrfs defaults,sync,compress=zlib 0 2

Clients are all backing up (slowly) via internet connection:

During the upload, I’m seeing an average of ~20MB/s writing:

On the UrBackup server, iotop shows:

Total DISK READ : 47.49 K/s | Total DISK WRITE : 3.59 M/s
Actual DISK READ: 47.49 K/s | Actual DISK WRITE: 40.76 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
5964 idle root 47.49 K/s 3.96 K/s 0.00 % 78.48 % updatedb.mlocate
2479 be/4 urbackup 0.00 B/s 1872.00 K/s 0.00 % 57.41 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup load]
2982 be/4 urbackup 0.00 B/s 23.75 K/s 0.00 % 0.50 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup load]
5112 be/4 urbackup 0.00 B/s 35.62 K/s 0.00 % 0.02 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [ibackup main]
5449 be/4 urbackup 0.00 B/s 3.96 K/s 0.00 % 0.01 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [image backup wr]
6240 be/4 root 0.00 B/s 7.92 K/s 0.00 % 0.00 % [kworker/u2:4]
6535 be/4 root 0.00 B/s 1733.48 K/s 0.00 % 0.00 % [kworker/u2:1]

Are there any recommendations to help reduce disk load while doing image backups?

In your iotop most of iowait is at the file backups (fbackup).

That updatedb is probably going through all your file backups btw. and since you have mounted it without noatime it is copy-on-writing all file entries

Yep, updatedb was definitely consuming a lot of IO. After I stopped urbackupsrv, updatedb was still running:

Total DISK READ : 1081.92 K/s | Total DISK WRITE : 51.71 K/s
Actual DISK READ: 1081.92 K/s | Actual DISK WRITE: 361.97 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
5964 idle root 1081.92 K/s 51.71 K/s 0.00 % 99.64 % updatedb.mlocate
110 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.22 % [jbd2/sda1-8]
7326 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.01 % [kworker/0:1]

So, I:

killall updatedb.mlocate
rm /var/lib/mlocate/mlocate.db
vi /etc/updatedb.conf

PRUNEPATHS=“/tmp /var/spool /media /data”

and then in fstab:

/data btrfs defaults,sync,noatime,nodiratime,compress=zlib 0 2

and check:

mount |grep btrfs
/dev/sdb on /data type btrfs (rw,noatime,nodiratime,sync,compress=zlib,space_cache)

I’ll keep an eye on it and report back if I’m still seeing a lot of disk IO. Thanks!

It looks like resuming a file backup is causing the high IO.

Total DISK READ : 337.23 K/s | Total DISK WRITE : 2.67 M/s
Actual DISK READ: 337.23 K/s | Actual DISK WRITE: 3.80 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
8460 be/4 urbackup 0.00 B/s 1218.01 K/s 0.00 % 55.87 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup load]
110 be/3 root 0.00 B/s 0.00 B/s 0.00 % 51.70 % [jbd2/sda1-8]
10429 be/4 urbackup 166.63 K/s 745.88 K/s 0.00 % 50.60 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup main]
11324 be/4 root 63.48 K/s 158.70 K/s 0.00 % 9.55 % [kworker/u2:1]
7880 be/4 urbackup 107.12 K/s 0.00 B/s 0.00 % 8.55 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup hash]
7881 be/4 urbackup 0.00 B/s 376.91 K/s 0.00 % 0.20 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup write]
11318 be/4 root 0.00 B/s 238.05 K/s 0.00 % 0.00 % [kworker/u2:0]

Total DISK READ : 845.03 K/s | Total DISK WRITE : 5.40 M/s
Actual DISK READ: 829.16 K/s | Actual DISK WRITE: 9.18 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
7474 be/4 root 587.15 K/s 968.01 K/s 0.00 % 97.16 % [btrfs-transacti]
10429 be/4 urbackup 257.87 K/s 2.12 M/s 0.00 % 44.81 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup main]
8460 be/4 urbackup 0.00 B/s 2.24 M/s 0.00 % 31.01 % urbackupsrv run --config /etc/default/urbackupsrv --daemon --pidfile /var/run/urbackupsrv.pid [fbackup load]
110 be/3 root 0.00 B/s 0.00 B/s 0.00 % 30.75 % [jbd2/sda1-8]
11324 be/4 root 0.00 B/s 47.61 K/s 0.00 % 0.00 % [kworker/u2:1]
11332 be/4 root 0.00 B/s 47.61 K/s 0.00 % 0.00 % [kworker/u2:2]

Is this a normal level of disk IO or do I need to change something on the server?

It is the full backups, most likely. As you can see they hard link files. This is pure random IO. You have a low RPM raid-6 configuration so you probably have max 100 IOPS. iostat shows this (tps).

So, after doing some experimenting with various caching settings, I’ve found out mounting the btrfs file system with the sync option was causing high IO & write rates. IO went from ~100-300tps with sync enabled to ~0-100tps with sync disabled. It can be seen here the significant difference in write rates:

So the lesson of the day is don’t use the sync option with btrfs and expect good performance :wink:

Hi,

Our test environnement is made of a Qnap TS 239 Pro II+ (Atom D525@1,8 Ghz Dual Core / 1 Gb RAM / 2 Western Digital Re 2TB in soft RAID1) and Debian 8 replacing the Qnap firmware (/boot on the 512 Mb DOM, other partitions on disks), backported btrfs 4.7 and kernel 4.7 installed, BTRFS backend :

root@backup-3:~# cat /proc/version
Linux version 4.7.0-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.7.8-1~bpo8+1 (2016-10-19)

root@backup-3:~# df -h ; btrfs fi df /media/56af2dc3-28fa-4e96-bc1d-3561039ecc62/ ; du -hs /media/56af2dc3-28fa-4e96-bc1d-3561039ecc62/
Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur
udev                10M       0   10M   0% /dev
tmpfs              198M     21M  177M  11% /run
/dev/md0           9,1G    2,6G  6,0G  31% /
tmpfs              494M       0  494M   0% /dev/shm
tmpfs              5,0M       0  5,0M   0% /run/lock
tmpfs              494M       0  494M   0% /sys/fs/cgroup
tmpfs              494M     60K  494M   1% /tmp
/dev/sdb1          475M     60M  391M  14% /boot
/dev/md2           1,9T    735G  1,1T  40% /media/56af2dc3-28fa-4e96-bc1d-3561039ecc62
tmpfs               99M       0   99M   0% /run/user/0
Data, single: total=721.00GiB, used=720.22GiB
System, DUP: total=32.00MiB, used=112.00KiB
Metadata, DUP: total=8.00GiB, used=6.82GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
9,1T    /media/56af2dc3-28fa-4e96-bc1d-3561039ecc62/
root@backup-3:~# btrfs version
btrfs-progs v4.7.3
root@backup-3:~#

We only do image backups, and this works fine.

Disks info :

root@backup-3:~# hdparm -i /dev/sda /dev/sdc

/dev/sda:

 Model=WDC WD2000FYYZ-01UL1B2, FwRev=01.01K03, SerialNo=WD-WCC1PN6ZYZ63
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=3907029168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode


/dev/sdc:

 Model=WDC WD2000FYYZ-01UL1B2, FwRev=01.01K03, SerialNo=WD-WCC1PC6AFPJY
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=3907029168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode

root@backup-3:~#

Partitioning info :

root@backup-3:~# fdisk -l /dev/sda /dev/sdc

Disque /dev/sda : 1,8 TiB, 2000398934016 octets, 3907029168 secteurs
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 512 octets
taille d'E/S (minimale / optimale) : 512 octets / 512 octets
Type d'étiquette de disque : gpt
Identifiant de disque : DACDCC98-A4C5-4A37-A4F1-1FB81433EC6D

Device        Start        End    Sectors  Size Type
/dev/sda1      2048   19531775   19529728  9,3G Linux RAID
/dev/sda2  19531776   23437311    3905536  1,9G Linux RAID
/dev/sda3  23437312 3907028991 3883591680  1,8T Linux RAID

Disque /dev/sdc : 1,8 TiB, 2000398934016 octets, 3907029168 secteurs
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 512 octets
taille d'E/S (minimale / optimale) : 512 octets / 512 octets
Type d'étiquette de disque : gpt
Identifiant de disque : 264CAF6D-8F4C-462A-9116-2CDCA1678A28

Device        Start        End    Sectors  Size Type
/dev/sdc1      2048   19531775   19529728  9,3G Linux RAID
/dev/sdc2  19531776   23437311    3905536  1,9G Linux RAID
/dev/sdc3  23437312 3907028991 3883591680  1,8T Linux RAID

root@backup-3:~#

Mount options :

# >>> [openmediavault]
UUID=56af2dc3-28fa-4e96-bc1d-3561039ecc62 /media/56af2dc3-28fa-4e96-bc1d-3561039ecc62 btrfs defaults,nofail,compress-force=zlib,enospc_debug 0 2
# <<< [openmediavault]

Since we go on production on similar more recent hardware (Qnap TS 251 / Celeron J1800@2,4Ghz / 1 Gb RAM / 2 Seagate 6TB ST6000VN0021 / soft RAID1 / BTRFS), we’ve notice a very high CPU I/O wait rate compared to the test platform, i.e about 90~95% IO wait and ~20 of load average !
And backups are much, much slower.

Production platform :

root@backup-1:~# cat /proc/version
Linux version 4.7.0-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.7.8-1~bpo8+1 (2016-10-19)
root@backup-1:~#btrfs version
btrfs-progs v4.7.3
root@backup-1:~# df -h ; btrfs fi df /media/259a6324-0c5f-4826-beb0-e77430ba0966/ ; du -hs /media/259a6324-0c5f-4826-beb0-e77430ba0966/
Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur
udev                10M       0   10M   0% /dev
tmpfs              178M     22M  157M  12% /run
/dev/md0            19G    2,8G   15G  17% /
tmpfs              444M       0  444M   0% /dev/shm
tmpfs              5,0M       0  5,0M   0% /run/lock
tmpfs              444M       0  444M   0% /sys/fs/cgroup
tmpfs              444M     72K  444M   1% /tmp
/dev/sdc1          475M     60M  391M  14% /boot
/dev/md2           5,5T    1,3T  4,3T  23% /media/259a6324-0c5f-4826-beb0-e77430ba0966
tmpfs               89M       0   89M   0% /run/user/0
Data, single: total=1.24TiB, used=1.21TiB
System, DUP: total=8.00MiB, used=160.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=13.00GiB, used=11.16GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=33.58MiB
11T     /media/259a6324-0c5f-4826-beb0-e77430ba0966/
root@backup-1:~#

Disks info :

root@backup-1:~# hdparm -i /dev/sda /dev/sdb

/dev/sda:

 Model=ST6000VN0021-1Z811C, FwRev=SC60, SerialNo=ZA108LPR
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=11721045168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode


/dev/sdb:

 Model=ST6000VN0021-1Z811C, FwRev=SC60, SerialNo=ZA107TF3
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=11721045168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode

root@backup-1:~#

Partitionning info :

root@backup-1:~# fdisk -l /dev/sda /dev/sdb

Disque /dev/sda : 5,5 TiB, 6001175126016 octets, 11721045168 secteurs
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 4096 octets
taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets
Type d'étiquette de disque : gpt
Identifiant de disque : 5011813B-C24C-4DCA-AD91-0B16D4309280

Device        Start         End     Sectors  Size Type
/dev/sda1      2048    39063551    39061504 18,6G Linux RAID
/dev/sda2  39063552    42969087     3905536  1,9G Linux RAID
/dev/sda3  42969088 11721043967 11678074880  5,4T Linux RAID

Disque /dev/sdb : 5,5 TiB, 6001175126016 octets, 11721045168 secteurs
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 4096 octets
taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets
Type d'étiquette de disque : gpt
Identifiant de disque : 42AD109E-23A3-47D2-8D4D-0E5AC99D08F9

Device        Start         End     Sectors  Size Type
/dev/sdb1      2048    39063551    39061504 18,6G Linux RAID
/dev/sdb2  39063552    42969087     3905536  1,9G Linux RAID
/dev/sdb3  42969088 11721043967 11678074880  5,4T Linux RAID

root@backup-1:~#

Mount options :

# >>> [openmediavault]
UUID=aaaf8b05-4c27-4948-802b-acce40b9a67e /media/aaaf8b05-4c27-4948-802b-acce40b9a67e btrfs defaults,nofail,compress-force=zlib,enospc_debug 0 2
# <<< [openmediavault]

RAID is clean

root@backup-1:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
      5838906368 blocks super 1.2 [2/2] [UU]
      bitmap: 0/44 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      1951744 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      19514368 blocks super 1.2 [2/2] [UU]

unused devices: <none>
root@backup-1:~#

After a lot of testing (swapping disks beetween NAS boxes, logging, googling, …) i came to the conclusion of a hardware related problem : the disks.

ST6000 are “Advanced Format Disks” (sector size = 4096b) and WD2000 are’nt (sector size = 512b).

Have somebody ever had some problems with this type of disks, Advanced Format ? How could we optimize this setup ?

Thanks.

Regards,

Not sure if this will help, but you could try mounting your btrfs volume with commit=300 or 600 or some other value larger than 30 (which is the default).

I found having a longer commit time significantly reduces the IO on my btrfs volume. From what I understand, if the system crashes, I could lose up to 5 minutes of data with commit=300. I’m not too concerned about losing a few minutes worth of data because I think urbackup will check for missing files and fill in any gaps after a crash during the next backup cycle (you may want to test/confirm this though).

I did a “mount -o remount” when I adjusted the commit value, so the graph represents an IO change with uninterrupted backups running during the settings change.

1 Like

Should be completely safe. UrBackup syncs the file system before the backup is set to complete. 2.1.x improves this a bit (startup recovery).

You could also try space_cache=v2

1 Like