Is it really safe to compare hash for incremental backup?

According to the documentation: “If the backup is incremental the client calculates a hash of 256 kbyte chunks and compares it to the previous image backup.”

  1. What kind of hash is used?

  2. Can you / we be really sure that no different chunks can have same hash, and so that no different chunks may be missing in the backup?

  1. For images: SHA256

  2. The goal is to make it extremely unlikely. The chance of a hash collision with SHA256 is 1:(2^256) (p=1/2^256) for two hashes. In the worst case you have 2TB/256kbyte = 8388608 different blocks in an incremental image backup. The chance of not having a collision for 8388608 blocks is then 1-(1-1/2^256)^8388608=9.313e-10. The chance of getting hit by lighning per year is 1e-7. The chance of a hash collision in this case is about the same chance you have of getting hit by a comet or meteorite in the next month.

OK thanks for the information.

If you do calculation the other way, then you see that there is a “good” chance to miss some changed block:

We can have 2^2097152 different 256 KiB blocks (256 KiB = 2097152 bits) but “only” 2^256 different SHA256, so there are huge number of different 256 KiB blocks with same hash value. But I agree that the overall chance to miss some changed block in “real life” is low enough. But the user should be aware of the risk and be advised not to do “endless” incremental backups.

PS.: I suggest that someone adds the information of this topic to the FAQ.

Yes. It’s the pidgeon hole principle ( ). Only there aren’t enough pidgeons.

I’ll see what I can write. This is not the only instance where something like this happens. Files are deduplicated via SHA512 hashes (You have to look at for that). The file transfer via internet mode uses a combination of CRC32 and MD5.