Receive Side Scaling on Debian 13 - with image backup

Hi everyone this is my first post and so it’s going to be a bit of shot in the dark as to the etiquette.

Firstly - this is a herculean effort for one man! Even over a ten year time span and so I take my hat off to you Martin Sir on a job well done.

I am using Urbackup to provide a better backup solution for a fairly narrow use case where I think the Urbackup codebase excels; namely Hyper-v VM backups.

The reason for posting on this forum is to discuss Urbackup performance. Time is money and time is life and time is short so performance is important to us / me. Always has been, always will be. One critical context item here is the globe environment, power consumption part. This means we do not run all our servers all the time especially backup servers. We bring them up, we backup and we turn them off. So we want, no, “we need”, backup performance! This whole notion that all the servers run all the time is old news.

So you don’t get bored, where I am going with this is I want to know whether RSS [receive side scaling] is something which could theoretically be leveraged in Urbackup image backups? My testing shows the OS (Debian 13) has enabled RSS on all the Nic’s in the box and across all 8 cores but it does not look like it is being used by Urbackup during a backup process. Both in terms of the copy speed and the eth0/1/2/3 interrupt distribution on the server between the processors at the time of the copy operation.

The rest is now just some context on the question and some thoughts on the matter as it has unfolded for us.

Coming from the way we backed up hyper-v disks before this feature is key. We use the below PowerShell function in a loop for each vm hard disk to just make a straight copy of the disk. This was actually allot faster than export VM can do it, but required the serve to be off. (Not so hot.)

Why? The reason is that explorer will utilize RSS if it is enabled in the client and obviously the receiving side as well. This meant we got around 1GB of data every two seconds over the backup network from each hyper-v host OS server to the backup server. Then it’s just simple maths to calculate the job run time based on the image sizes.

So for a fresh installed server on say 2019 Server we used to take about 80 seconds for a complete backup.

function Copy-File
{
 param([collections.Arraylist]$from,[string]$to)


$FOF_CREATEPROGRESSDLG = "&H0&"
$FOF_NOCONFIRMATION = "&H10&"
$objShell = New-Object -ComObject "Shell.Application"
$objdestFolder = $objShell.NameSpace($to)

if ($from.count -gt 1)
{
    [collections.Arraylist]$Foldernames = @()
    $from |% `
    {
        [void]$Foldernames.add($([io.path]::GetDirectoryName($_)))
    }
    $foldernames | Group |% `
    {
        $return = get-location
        set-location $_.name
        $objorigFolder = $objShell.NameSpace($_.Name)
        $guid = [guid]::NewGuid()
        $newFolderPath = $(join-path $_.name $guid)
        new-item $newFolderPath -ItemType directory | out-null
        $objcopyFolder = $objShell.NameSpace($newFolderPath)
        ($objorigFolder.items() |? {$_.path -in $from}).path |% `
        {
            $objcopyFolder.MoveHere($_, $FOF_NOCONFIRMATION)
        }
        $objdestFolder.CopyHere($objcopyFolder, $FOF_CREATEPROGRESSDLG)
        $objmovefolder = $objShell.NameSpace($(join-path $to $guid))
        $objmovefolder.items() |% `
        {
            $objdestFolder.MOveHere($_.path, $FOF_NOCONFIRMATION)
        }
        $objcopyFolder.items() |% `
        {
            $objorigFolder.MoveHere($_.path, $FOF_NOCONFIRMATION)
        }
        remove-item $guid
        remove-item (join-path $to $guid)
        Set-location $return
    }
}
else
{
    [string]$file = $from[0]
    $objdestFolder.CopyHere($file, $FOF_CREATEPROGRESSDLG)
}
}

Now we are switching to UrBackup, RCT now means that our differential image backups are SO much faster once the first disk image is completed. This is simply a killer feature and we can back them up when online, bonus. But the initial backups are now taking like 5 minutes for the same size image as before. :anguished_face:

Obviously, to avoid flooding the network with contending traffic and to keep performance as high as possible we would never dream of allowing multiple hosts on the same network to be backed up concurrently. Max Simultaneous backups therefore has to = 1. We do not have additional vlan’s for other backup networks at this site.

Long question short, I seem to have lost RSS which is a pity, since I am now limited to a mere 100MB/’s as opposed to 400MB/s with RSS utilising all the NIC’s simultaneously. I did try to increase the threads a client can use for a copy, but I am pretty sure this only affects file copy not image operations.

IHMO the network IS the bottleneck in the product codebase so far as I can tell. I am taking CPU and RAM out of the equation since I don’t think in a server environment these resources should even be a discussion. Disk perf is fairly simple to fix this days, for instance with a raid0 [or raid 10 if the backups are that critical] PCIx M.2 adapter card and a few M.2 nNVME disks. Then the limiting factors could be [but should not be] RAM and CPU.

So we are down to the network performance and obviously we can go and stick in a 10G network, but that’s an expense for a future stable economy and not for now.

With the following command on the server I can watch the ethernet interrupts spike the CPU’s and I would expect at least four cores to be in use, no dice.

watch -n 1 ‘cat /proc/interrupts | grep -E “(enp|eth|MSI)” | head -20’

Using ethtool I can see that RSS is for sure enabled and active, so what gives ?

Since the hyper-v hosts are Windows Server boxes and I am going to assume that urbackup client does not replace the Windows network stack - RSS we already know works fine on the client (sending) side. But our copy speeds indicate it’s not being utilized any more.

Obviously this stuff is really complicated and I am trying to say a huge amount in one post which is always tricky when so many factors are in play.

Thanks in advance.

Update:

If I open a CIFS window (explorer window) on the client (windows server 2019) and initiate a drag and drop copy of a 50Gb vhd file directly into the mounted urbackup storage disk target file system; I can see the client NIC’s are all being fully utilized in sending data (they are all maxed) and the disk on the receiving side (which is a pass through disk to a Debian VM) is showing writes of around 400mb/s so the speed of the network - all good.

Also using the two watch commands below I can see all NIC’s on the debian server are receiving that data and packets climbing sequentially on all NIC’s.

watch -n 1 ‘cat /proc/net/dev | grep -E “(eth|enp)”’
watch -n 1 ‘cat /proc/softirqs | grep NET_RX’

So upshot, RSS is working when explorer is pushing a large volume of data to the Debian13 OS over CIFS.

However when I then request the same VHD file is backed up via URbackup initiated backup this is going over just a single NIC. The client side only show’s one NIC fully saturated and the watch command again shows eth0 data sets climbing whereas other Nic’s packet count data sets are static and not increasing.

I have tried upping the Maximum number of simultaneous jobs per client value from 2 - 8 and back to 1. Without any change in behaviour. I think this only affects file copies as it happens.

So at this point I believe this limitation is most likely, by design, unfortunately. It could of course also be a limitation of the Hyper-V backup architecture, this is lower level than my knowledge goes. However I would be delighted if anyone could add any information they might have.

Is the client or the server the bottleneck?

No idea about RSS, but maybe you can experiment with the advanced settings? Make sure encryption is disabled, maybe also set the full file backup transfer mode to “raw” just to see if the checksums during transfers are the bottleneck.

In the advanced settings I am already using RAW with no encryption enabled.

The penny has finally dropped as to what is going on with this. Mutilple paths between endpoints are by definition multiple TCP streams. Utilisation of RSS functionality in windows is a function of SMB3.0 which is a layer up from TCP. SMB3.0 is implemented in SAMBA library within Linux. So SMB at the application layer handles creation and utilisation and management of the multiple tcp streams between client and server. TCP does not. TCP is the transport SMB employs.

Opening a tcp port therefore is below SMB and will therefore always be constrained to the maximum speed of a single nic only.

in order to make this work, the urbackup “Hyper-v commercial client” and an additional server connector for this specific case would need to be added to the codebase to allow Windows Hyper-V backups to run over SMB3.0. For linux, MAC, Windows and Internet clients even though they have SAMBA for Mac and Linux and obviously Windows it is installed anyway, the chances of multiple NIC’s are reduced considerably so probably not worth the effort.

But for the hyper-v Windows Commercial case, I would be surprised if any servers did not have at least two NIC interfaces or more so this is for sure a removeable bottleneck.

For more information google “SMB3.0 Linux Debian 13”

Regards Anthony.

See above edited post.