Incremental backup fails to VHDX (2.5.23/2.5.17)

Hi. Again, wanted to thank you for the great tool - enjoying it full time. To not rise a spare topic - question: is there a way to donate/support project with reoccuring payments? Like via Paypal, Patreon, whatsover, so I can donate every month without remembering it? Haven’t seen such option on the site or forum topics.

Now about the issue I’m facing:

Description:
After setting backup destination for images to VDHX and remaking full backups for some Windows 10 clients I’ve started to get failures. Failure is consistent. Previously, same client and server were working without issues with VHD container type. LAN. Wired 1Gb links. No losses or any other network issues observed. Other Windows 10 clients on the same network backup without issues.

Steps to reproduce:

  1. Setup server 2.5.23 (Centos 7) and client (Windows 10, x64, 1903). Destination folder for server backups is ZFS. Client holds NTFS. Backup partition C. Partition occupies whole disk + Standard EFI partition + Windows system partitions on the drive.
  2. Set image backups VDHX (no compression).
  3. Perform full backup.
  4. Mark full backup as archived on server.
  5. Perform incremental backup by schedule (in my case 24 hours) and make sure that incremental backup is more than 50Gb.

Expected behavior:
Backup succeeds.

Actual behavior:
Backup fails with error:

Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: ERROR: Unknown bat entry state 6
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: ERROR: Error setting unused area (from byte 485331980288 to byte 485331984384). No such file or directory (code: 2)
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: ERROR: FATAL ERROR: Could not write to VHD-File
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: * About to connect() to MAILSERVER port 25 (#5)
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: *   Trying 49.12.220.39...
...................[MAIL SENDING HERE, WHICH IS SUCCESSFUL]
Apr 13 23:11:07 BACKUP_SERVER urbackupsrv[3210]: * Connection #5 to host MAILSERVER left intact
Apr 13 23:11:07 BACKUP_SERVER urbackupsrv[3210]: Transferred 15.6446 GB - Average speed: 135.652 MBit/s
Apr 13 23:11:07 BACKUP_SERVER urbackupsrv[3210]: Script does not exist urbackup/post_incr_imagebackup
Apr 13 23:11:10 BACKUP_SERVER urbackupsrv[3210]: Time taken for backing up client BACKUP_CLIENT: 17m 1s
Apr 13 23:11:10 BACKUP_SERVER urbackupsrv[3210]: ERROR: Backup failed
Apr 13 23:11:10 BACKUP_SERVER urbackupsrv[3210]: Updating statistics...
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: Updating image stats...
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: Updating file statistics...
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: Done updating statistics.
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: msg=WAKEUP
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: WARNING: Exponential backoff: Waiting at least 2h 40m before next image backup

On the server, meanwhile, file exists. Here is the content of directory:
-rwxr-x— 1 urbackup urbackup 47G Apr 13 23:11 Image_C_220413-2254.vhdx
-rwxr-x— 1 urbackup urbackup 15M Apr 13 22:54 Image_C_220413-2254.vhdx.cbitmap
-rwxr-x— 1 urbackup urbackup 29M Apr 13 23:11 Image_C_220413-2254.vhdx.hash
-rwxr-x— 1 urbackup urbackup 34K Apr 13 22:54 Image_C_220413-2254.vhdx.mbr

File system errors (as well as ZFS status), Linux kernel errors or segmentation faults are absent, i.e. all good.

Errors from mail:
ERROR: Error setting unused area (from byte 485331980288 to byte 485331984384). Code 0
ERROR: FATAL ERROR: Could not write to VHD-File

2 hours later:
ERROR: Error setting unused area (from byte 485331980288 to byte 485331984384). No such file or directory (code: 2)
ERROR: FATAL ERROR: Could not write to VHD-File

It seems that increment is failing on the same file increment request. Probably other clients are not failing yet, because their incremental images are smaller.

I hope this is helpful.

Sounds like a hairy issue (difficult to reproduce), so I looked over the code how this could happen.

The only way I found if a sector bitmap entry is mistakenly a payload block entry. So perhaps this would fix it:

(Untested; This would mean the vhdx files it created so far are corrupted. I would really appreciate if you tested them as described in the release thread :slight_smile: )

You mean to test with check sum validation after backup is made?

Regarding corrupted VHDX files - before running on production, I’ve tested in test environment - at least full image backup recovered successfully. Tested with Windows 10 Client. I do not remember if I’ve tested with incremental that time, but will do now.

Yeah, and preferable with some other VHDX-reading code (like the Windows one).
The create_md5sums_imagebackup file should also work on Windows clients (Create at C:\Program files\UrBackup\create_md5sums_imagebackup). You could be running into timeouts though (if the md5sums creation takes too long), if that happens please tell and we can look for work-arounds for that.