Sorry - I’ve scanned the documentation and the forum - but I can’t find an answer to this:
I’m curious about how the client detects newly modified files to backup. From looking at system logs, it appears that even (some?) unchanged files get re-hashed during an Internet incremental backup - e.g.
I also see high CPU usage from the backend process during a backup.
Does this imply that the client is entirely rehashing the backup set during each incremental backup? If so, is there a reliable way to reduce the resource impact (CPU usage and disk I/O usage) within Urbackup?
The file is too small, and doesn’t get hashed at all.
A good first step to analyse the CPU/IO impact would be to look at the thread names (e.g. in htop – enable in settings; iotop shows it per default). That way you’d now if the hashing or transfer is causing it.
The ultimate way is to install debug symbols (install the -dbg package), then run perf top -g.
Ok, sorry, didn’t read your username. Idk how one would do it on macOS You might also want to make sure the crypto++ is compiled with all the CPU acceleration active (AES, SHA etc.)
So how does the software determine that an existing file needs to be backed up afresh? Given a file of sufficient size, is it always rehashed entirely on each incremental backup pass, and compared with the stored hashes?
Haha! I’ll check on the compile flags for that - thanks - but my concern is as much about disk I/O and time required for backup, as it is about CPU usage. My deployment target would be multi-TB production workstations, and the system would not cope with entirely re-reading and hashing 12TB of data on each backup cycle.
My existing backup software relies on fsevents (Time Machine), or changes in mtime or size; I realise that’s not a foolproof and completely robust way to catch modifications - but it seems to catch the vast majority of changes, with a vastly reduced resource load.
EDIT: I realise that fsevents - of course - are not supported on macOS. It’s on my wishlist.
This would require an implementation of ChangeJournalWatcher.cpp for macOS. Maybe there is something better in APFS to display the changes to a certain point in time/transaction (it is CoW-Metadata after all…).
So - please forgive me if I’m just looking too closely at log files and leaping to wrong conclusions from them - is it not to be expected that, on a system doing Internet incremental backups, where a full backup was completed yesterday and has hardly been touched since (other than running backups), I would expect to see a lot of system log entries like this? (Because that’s what I am seeing…)
Safari hasn’t been touched since the previous backups (running at hourly intervals) - but it looks like the files are being rehashed each time.
Of course the other possibility is that this is a quirk of the macOS build I’m running, and that this is not expected behaviour on a supported platform - so presumably something about how macOS is reporting the timestamps is off?
Indeed - I’m building up to that…
Unfortunately APFS is not yet well documented for developers - so I’m not sure such things are accessible from a public API.
Indeed, maybe you can further track this down by looking if the “change indicator” of Safari has changed between backups in the respective *.ub files (i.e. grep ‘Safari"’ *.ub on the server and on the client). For unchanged files it puts the hash directly in the .ub file if parallel hashing is enabled.
Also perhaps try disabling parallel hashing, to check if it’s only a problem with that…