Problem creating a backup Hashes for Verification failed.

ivan010fed · July 10, 2024, 6:26pm

I’m just starting to use urbackup and when testing I came across the following problem.
I have a server running Ubuntu 22.04, it has a client 2.5.25 that makes backups without snapshots in file mode.
Everything worked well until I became interested in what the server would do if, say, as a result of a failure on the backup server disk, the backup copy lying on the server itself was damaged.
I took and made changes to one of the text files in the copy on the server without unpinning the file from the hard links. My change was made to all the backups that were on the server; it’s clear that the files are linked by links and this was expected.
After that, I first launched an incremental backup, then a full one, but the wrong one didn’t forgive, ending in an error
Error in incremental backup
Hashes for “/media/BACKUP/urbackup/WebUI/240627-1046/www/roundcube/INSTALL” differ. Verification failed.
Backup verification failed
Backup failed
Error in full backup
Hashes for “/media/BACKUP/urbackup/WebUI/240627-1058/www/roundcube/INSTALL” differ. Verification failed.
Errors
06.27.24 11:00
Hashes for “/media/BACKUP/urbackup/WebUI/240627-1058/www/roundcube/temp/RCMTEMPattmnt667d1a9190130185726797” differ. Verification failed.
Hashes for “/media/BACKUP/urbackup/WebUI/240627-1058/www/roundcube/temp/RCMTEMPattmnt667d1b069b609521931338” differ. Verification failed.
Backup verification failed
Backup failed
And no matter how many times I try to make a copy, the result is the same.
The question is what to do in such a situation. Why doesn’t the server just make a new copy of this file and send a notification that my old copy is corrupted or that it simply doesn’t match the original?
And how to correct such a situation if it really happens, such as completely deleting all copies and starting the process all over again?

GilesP · July 10, 2024, 6:56pm

How would the server know that its copy is corrupted? It might be that the hash has changed.

If you want to prevent bitrot on the server you need to use a filesystem (like ZFS) that supports it.

ivan010fed · July 11, 2024, 8:36am

I agree with you that the server cannot determine whether the copy or the original is corrupted. However, as I understand it, the server saves file hashes when copying and can verify them.

But that’s not even the question. The question is why it doesn’t make a copy.

As I understand it now, the server sees that the file in the backup and the original file do not match, but it also sees that the original file has not changed. What does the server do? Nothing. It just says that there is such an error, the backup is not successful, and that’s it.

Why doesn’t it just make a new copy of this file, or even a whole new copy of the entire data array? Although this is wasteful, but okay, let it be at least that way. But as it turns out, there is a problem and I don’t have any backups?

The main question is how to resume copying without deleting all backups or the entire client?

GilesP · July 14, 2024, 11:41am

Urbackup has a database of file hashes on the server of its backed up files. The hashes always match the files it has backed up. You have changed its internal state when you corrupted the file, but urbackup doesn’t know why its state is inconsistent. It could be a variety of reasons including hardware failures so it plays safe and aborts.

Suppose there is a network problem that causes database corruption. Packets are being lost. On a full backup - you are suggesting that urbackup replaces all the files on the server (which are fine) with corrupt client files.

ivan010fed · July 15, 2024, 5:43am

Since there’s a database of file hashes, the server should at least be able to compare the hashes of the files it needs to copy with the hashes of the files in the database and the hashes of the files in the backup. It should then be able to determine that in this case, the hash of the file on the client matches the hash in the database but does not match the hash of the actual file. This suggests that the problem lies with the file in the storage.

However, this is not my primary concern right now. I understand that my actions were not standard procedure, but they are possible, for instance, due to a bad block on the disk where the copy is stored.

The question is, how can I restore the backup of this client?

Should I completely delete it from the server, erasing all copies and creating a new one, or is there a mechanism to explain to the server that the correct file is now on the client and that I want the server to copy it and continue making backups?

GilesP · July 15, 2024, 6:15am

I don’t know if there is a mechanism. You could try undoing what you have done to the file on the server.

ivan010fed · July 15, 2024, 6:23am

I understand that I can manually delete or recreate the backup, but what if such an error occurs in a real-world scenario due to a bad block, administrator negligence, or other unforeseen circumstances? I would like to have a mechanism in place to handle these situations gracefully and simply copy the problematic file into a new backup, allowing the backup process to continue without interruption.

GilesP · July 15, 2024, 7:38pm

I have told you that ZFS provides the functionality to deal with bad blocks on the server. I have also told you that providing this functionality in urbackup could make the situation much worse. You could end up populating all backup versions on the server with corrupt files. No reputable backup software provides the functionality to change individual files in old backups.

i suggest you use something like freefilesync if you want to control individual files on the server and the client.

ivan010fed · July 16, 2024, 7:24am

Yes, I understand what you’re saying, but I have a feeling we’re talking about different things.

Most backup systems create a full copy of the entire data volume when creating a full backup, and then only add changed data to it when creating new incremental backups. This can then be assembled into a full copy and so on.

This system does not do this. It makes an initial copy and then all subsequent copies are made from this copy using links to unchanged files and adding new data. By the way, I still don’t quite understand the difference between a full copy and an incremental copy, except that for a full copy the server supposedly downloads all the files from the client (as stated in the instructions, but in practice my second and subsequent full copies are much faster than the first), and for an incremental copy only the changed files, and the mechanism is the same, and that’s the beauty of this system (a big saving of disk space) and as I see it for now the downside is that if a file on the server’s disk gets corrupted, then if it hasn’t been changed in a long time it will be corrupted in all copies, and I’m kind of ready for that, but the question is how to fix it. So far I only see 1 option: completely remove the client from the server, remove ALL previously created copies and start copying again, but in my opinion this is a very bad solution to the problem, so I would like to find some other option. For example, make a full copy without creating links as it is done the first time, even if it means losing a lot of storage space, and creating all subsequent copies from this new copy if the system cannot simply copy 1 file that has changed.

In essence, the system sees that there is a difference in a single file, it throws an error that the file hash in the storage differs.

Hashes for “/media/BACKUP/urbackup/WebUI/240715-1007/www/roundcube/INSTALL” differ. Verification failed.

Why can’t I just ignore this error and make a new copy of this file or the entire job and continue copying?

The problem is that my server has stopped creating copies of this partition regardless of whether other files on the server are changing or not. Or rather, it does create copies, but it does not show them in the interface because they are all with errors. If you disable the “End-to-end verification of all file backups” option in the settings, the server stops seeing this error and continues to create copies, but in my opinion this is not a solution to the problem, but simply hushing it up.

uroni · July 16, 2024, 6:46pm

It’s not designed for this. It simply assumes this problem is taken care of by ZFS/btrfs/ReFS/Ceph …
Debugging: End-to-end verification of all file backups: is only there to check the program itself for bugs. It is not optimized at all for example.

If you want to recover from such a problem given e.g. btrfs has notified you about a file corruption … I have a script that deletes/renames all files that reference the corrupted data (on btrfs). Then it’ll re-download the file (since it cannot find it anmyore). Obviously the better option might be to retire the whole backup server and (slowly) replace it with a new one at such a point.

ivan010fed · July 17, 2024, 5:51am

And if the file system is EXT4, your script will work?
Is it possible to somehow delete such a damaged file, for example from the last copy, and force the system to make a copy of it again?

Is there some kind of mechanism to delete specific files and folders from the backup manually or using a script or something else so that the system can then copy them again. For example, if I find out that the file is corrupted due to damage during transmission, or malicious sabotage on the server, or for any other reason?

GilesP · July 17, 2024, 6:22pm

In my view if there is damage in transmission, disk corruption, or sabotage then the best thing to do is to stop all backups right there and then and start afresh.

Ultimately, it depends on how much you value your data. If you are going to the trouble to do versioned backups it suggests caution is a good strategy.

ivan010fed · July 18, 2024, 6:22am

You are absolutely right, and I am not saying that after detecting such actions I will simply continue copying, but I want to understand what exactly is available to me in the system while it is still in test mode and I have not yet disabled my old copying system, so perhaps my questions seem a bit strange.
I am learning and studying.

If I simply manually go through all or some of the copies and delete the corrupted file without editing the server’s database, will I be able to get a new copy of this file from the server, or will I simply kill the entire backup server?

For example, 10 minutes ago I was sure that the archiving mode was an opportunity to force the server to create archives from my copies and this would be a second mode of protection against failures and to some extent against sabotage, and they could be stored on a different partition or even on a different server, but I think I was wrong and enabling archiving does not create an archive of data on the disk, but only logically marks the data, which is a pity.