BTRFS compression; massive performance hit

I’ve setup a new BTRFS partition for UrBackup to take advantage of the “Raw copy-on-write file” image format and compression.

I have it installed on Debian 11 / OMV 6 bare metal, not in Docker.

The mount options are:
rw,noatime,nodiratime,compress-force=zstd:1,ssd,space_cache=v2,subvolid=5,subvol=/

Without compression the full image backup runs easily at 600-700 Mbits.
With compression, either lzo or zstd:1, the throughput is half of it 250-300 Mbits.

I wonder if this is expected or I’m missing something.
I did expect same or bit lower performances.

Any other test tells me the performances are similar or 10-20% lower with compression.
A sequential write for a 10GB test file runs at the same speed, 126 Mib/s.

Tested with fio and got very similar results:

zstd:1 compression forced

RWMIX70 4k
READ: bw=8972KiB/s (9188kB/s), 8972KiB/s-8972KiB/s (9188kB/s-9188kB/s), io=534MiB (560MB), run=60905-60905msec
WRITE: bw=3867KiB/s (3960kB/s), 3867KiB/s-3867KiB/s (3960kB/s-3960kB/s), io=230MiB (241MB), run=60905-60905msec

RWMIX70 1M
READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), io=11.9GiB (12.8GB), run=62569-62569msec
WRITE: bw=85.6MiB/s (89.7MB/s), 85.6MiB/s-85.6MiB/s (89.7MB/s-89.7MB/s), io=5353MiB (5613MB), run=62569-62569msec

WRITE 1M
WRITE: bw=44.6MiB/s (46.8MB/s), 44.6MiB/s-44.6MiB/s (46.8MB/s-46.8MB/s), io=2685MiB (2815MB), run=60184-60184msec

no compression

RWMIX70 4k
READ: bw=9816KiB/s (10.1MB/s), 9816KiB/s-9816KiB/s (10.1MB/s-10.1MB/s), io=586MiB (614MB), run=61106-61106msec
WRITE: bw=4221KiB/s (4323kB/s), 4221KiB/s-4221KiB/s (4323kB/s-4323kB/s), io=252MiB (264MB), run=61106-61106msec

RWMIX70 1M
READ: bw=183MiB/s (192MB/s), 183MiB/s-183MiB/s (192MB/s-192MB/s), io=11.3GiB (12.2GB), run=63344-63344msec
WRITE: bw=80.5MiB/s (84.4MB/s), 80.5MiB/s-80.5MiB/s (84.4MB/s-84.4MB/s), io=5099MiB (5347MB), run=63344-63344msec

WRITE 1M
WRITE: bw=51.6MiB/s (54.1MB/s), 51.6MiB/s-51.6MiB/s (54.1MB/s-54.1MB/s), io=3107MiB (3258MB), run=60171-60171msec

The array is running with bcache ssd so pure reads tops at 1.1 GB/s.

The filesystem is the only bottleneck as currently I’m running with writeback bcache caching and the full image backup runs mostly at 2.2 Gbps for 599GB, going down to 1.2-1.7Gbps after 420GB which is the ssd cache size. Flushing the dirty data during the backup works really and it’s choking not very often.
This allows me to complete a full image backup in 35 mins instead of 140 mins with bcache in writethrough.