Incremental backup fails to VHDX (2.5.23/2.5.17)

Hi. Again, wanted to thank you for the great tool - enjoying it full time. To not rise a spare topic - question: is there a way to donate/support project with reoccuring payments? Like via Paypal, Patreon, whatsover, so I can donate every month without remembering it? Haven’t seen such option on the site or forum topics.

Now about the issue I’m facing:

Description:
After setting backup destination for images to VDHX and remaking full backups for some Windows 10 clients I’ve started to get failures. Failure is consistent. Previously, same client and server were working without issues with VHD container type. LAN. Wired 1Gb links. No losses or any other network issues observed. Other Windows 10 clients on the same network backup without issues.

Steps to reproduce:

  1. Setup server 2.5.23 (Centos 7) and client (Windows 10, x64, 1903). Destination folder for server backups is ZFS. Client holds NTFS. Backup partition C. Partition occupies whole disk + Standard EFI partition + Windows system partitions on the drive.
  2. Set image backups VDHX (no compression).
  3. Perform full backup.
  4. Mark full backup as archived on server.
  5. Perform incremental backup by schedule (in my case 24 hours) and make sure that incremental backup is more than 50Gb.

Expected behavior:
Backup succeeds.

Actual behavior:
Backup fails with error:

Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: ERROR: Unknown bat entry state 6
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: ERROR: Error setting unused area (from byte 485331980288 to byte 485331984384). No such file or directory (code: 2)
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: ERROR: FATAL ERROR: Could not write to VHD-File
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: * About to connect() to MAILSERVER port 25 (#5)
Apr 13 23:11:06 BACKUP_SERVER urbackupsrv[3210]: *   Trying 49.12.220.39...
...................[MAIL SENDING HERE, WHICH IS SUCCESSFUL]
Apr 13 23:11:07 BACKUP_SERVER urbackupsrv[3210]: * Connection #5 to host MAILSERVER left intact
Apr 13 23:11:07 BACKUP_SERVER urbackupsrv[3210]: Transferred 15.6446 GB - Average speed: 135.652 MBit/s
Apr 13 23:11:07 BACKUP_SERVER urbackupsrv[3210]: Script does not exist urbackup/post_incr_imagebackup
Apr 13 23:11:10 BACKUP_SERVER urbackupsrv[3210]: Time taken for backing up client BACKUP_CLIENT: 17m 1s
Apr 13 23:11:10 BACKUP_SERVER urbackupsrv[3210]: ERROR: Backup failed
Apr 13 23:11:10 BACKUP_SERVER urbackupsrv[3210]: Updating statistics...
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: Updating image stats...
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: Updating file statistics...
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: Done updating statistics.
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: msg=WAKEUP
Apr 13 23:11:11 BACKUP_SERVER urbackupsrv[3210]: WARNING: Exponential backoff: Waiting at least 2h 40m before next image backup

On the server, meanwhile, file exists. Here is the content of directory:
-rwxr-x— 1 urbackup urbackup 47G Apr 13 23:11 Image_C_220413-2254.vhdx
-rwxr-x— 1 urbackup urbackup 15M Apr 13 22:54 Image_C_220413-2254.vhdx.cbitmap
-rwxr-x— 1 urbackup urbackup 29M Apr 13 23:11 Image_C_220413-2254.vhdx.hash
-rwxr-x— 1 urbackup urbackup 34K Apr 13 22:54 Image_C_220413-2254.vhdx.mbr

File system errors (as well as ZFS status), Linux kernel errors or segmentation faults are absent, i.e. all good.

Errors from mail:
ERROR: Error setting unused area (from byte 485331980288 to byte 485331984384). Code 0
ERROR: FATAL ERROR: Could not write to VHD-File

2 hours later:
ERROR: Error setting unused area (from byte 485331980288 to byte 485331984384). No such file or directory (code: 2)
ERROR: FATAL ERROR: Could not write to VHD-File

It seems that increment is failing on the same file increment request. Probably other clients are not failing yet, because their incremental images are smaller.

I hope this is helpful.

Sounds like a hairy issue (difficult to reproduce), so I looked over the code how this could happen.

The only way I found if a sector bitmap entry is mistakenly a payload block entry. So perhaps this would fix it:

(Untested; This would mean the vhdx files it created so far are corrupted. I would really appreciate if you tested them as described in the release thread :slight_smile: )

You mean to test with check sum validation after backup is made?

Regarding corrupted VHDX files - before running on production, I’ve tested in test environment - at least full image backup recovered successfully. Tested with Windows 10 Client. I do not remember if I’ve tested with incremental that time, but will do now.

Yeah, and preferable with some other VHDX-reading code (like the Windows one).
The create_md5sums_imagebackup file should also work on Windows clients (Create at C:\Program files\UrBackup\create_md5sums_imagebackup). You could be running into timeouts though (if the md5sums creation takes too long), if that happens please tell and we can look for work-arounds for that.

I’ve added patch, recompiled server, restarted server and set one of the clients to make image backups as VHDX (via server web interface). After that I’m starting full image backup from the server. Client starts backup, however resulting file is VHD on the server. Same result if I start backup from the client.

Seems like settings on client level for backup type are not taken into account.

I was able to overcome this by making changes to general settings.

Probably worth fixing.

During testing of mounting VHDX images encountered an issue.

Context:

  • I have CLIENT, with VHD backups and new VHDX backups. Full VHDX backups is from 03.06.2022 (3rd of June). This is confirmed via interface and via mounting full image in Windows 10 - everything works as expected.
  • I’m trying to mount incemental image on server with command:
urbackupsrv mount-vhd --file /mnt/safecopy/urbackup/CLIENT/220621-2145_Image_C/Image_C_220621-2145.vhdx --mountpoint /mnt/c

Command fails with:

Loading FUSE kernel module...
Starting VHD background process...
Waiting for background process to become available...
Timeout while waiting for background process. Please see logfile (/var/log/urbackup-fuse.log) for error details.

Inside /var/log/urbackup-fuse.log:

2022-02-20 23:30:03: Mounting VHD via fuse...
2022-02-20 23:30:03: VHD-Parent: "/mnt/safecopy/urbackup/CLIENT2/220207-1647_Image_C/Image_C_220207-1647.vhd"
2022-02-20 23:30:03: Volume offset is 524288 bytes. Configure via --offset
2022-06-22 23:05:39: Mounting VHD via fuse...
2022-06-22 23:05:39: ERROR: Footer checksum wrong. Switching to header
2022-06-22 23:05:39: ERROR: Header and footer checksum wrong
2022-06-22 23:05:39: ERROR: Error opening VHD file

Please note that:

  1. Urbackup server tries to pickup backup from CLIENT2, i.e. completely different client than incremental image backup.
  2. It’s picking up a VHD file for some reason.

Please let me know should you need any additional info on this one.

Patch has been tested and works well. I’m unable to reproduce issue with size anymore. Content of containers matches files on actual system (I’ve used restore + WinMerge in order to make sure it is). Full image VHDX container mounts in Windows without issues. Incremental one I cannot check - due to an issue above.

So patch looks good.