Last version is slower after running remove_unkown

orogor · March 15, 2018, 3:57pm

On one server, i did run remove_unknown (took 3 days), it then became slow.
As in too slow to cope with the backup rhythm, the backup went from like 1-2 minutes to 20-30 minutes (x264 servers, that were backup up 4x per day before)

I did then run defrag-database ( took 4 hours) thinking there were too much changes and it would need some maintenance.
After the defrag, the database file was twice as large as before (15gb to 30 gb)

i think it is now in startup maintenance or schema upgrade (showing number of entries processed + percentage at 6.5M files/48%, started 3 hours ago )

The other server which was also updated but on which remove_unknown wasn’t run has normal backup speed. (maybe the problem isn’t actually with remove_unknown)

One of the thing which is slow is :
Referencing snapshot on “xxx” for path “yyy” failed: FAILED
About 1-2 line get printed every second but there s a lot of lines that get printed.
Maybe it is one line per path stored in the .directory_pool ?

So this one is very noticable, because the actual backup is often <1min, but that last step with run during 2-3 minutes.

orogor · March 15, 2018, 6:21pm

So this was the percentage thing, apparently it was rebuilding the index

2018-03-15 14:44:52: Deleting database journal…
2018-03-15 14:50:50: Sending file “/usr/share/urbackup/www”
2018-03-15 14:50:50: Sending file: /usr/share/urbackup/www/index.htm
2018-03-15 14:50:50: Sending file: /usr/share/urbackup/www/index.htm done
2018-03-15 14:50:50: Sending file “/usr/share/urbackup/www/js/vs/loader.chash-7bbdd9ad3da370f14fe85315b79133b3.js”
2018-03-15 14:50:50: Sending file “/usr/share/urbackup/www/images/urbackup.png”
2018-03-15 14:50:50: Sending file: /usr/share/urbackup/www/images/urbackup.png
2018-03-15 14:50:50: Sending file: /usr/share/urbackup/www/images/urbackup.png done
2018-03-15 14:50:50: Sending file “/usr/share/urbackup/www/js/vs/loader.chash-7bbdd9ad3da370f14fe85315b79133b3.js”
2018-03-15 15:05:38: Copying/reflinking database…
2018-03-15 15:05:38: Reflink ioctl failed. errno=95
2018-03-15 15:05:38: Reflinking failed. Falling back to copying…
2018-03-15 15:07:27: WARNING: Creating file entry index. This might take a while…
2018-03-15 15:07:27: Getting number of files…
2018-03-15 15:07:28: Dropping index…
2018-03-15 15:07:29: Starting creating files index…
2018-03-15 15:07:29: Creating files index: 0% finished
2018-03-15 15:08:33: File entry index contains 1000 entries now.
2018-03-15 15:08:33: File entry index contains 2000 entries now.

Then it did a single backup , all backups are now stuck , logs shows

2018-03-15 19:14:49: ERROR: DATABASE BUSY in CDatabase::Prepare
2018-03-15 19:14:59: ERROR: DATABASE BUSY in CDatabase::Prepare
2018-03-15 19:15:09: ERROR: DATABASE BUSY in CDatabase::Prepare

I ll wait some more then try restarting the service again

orogor · March 15, 2018, 8:11pm

Ok, i restarted the services and things seems to go back to normal.

I still think this action slow things down a lot, but right now the backup queue isn’t full (it s doing about 5 backup at a time and the max set is 15):

This can go on for 5 minutes even when 0kb are backuped
Referencing snapshot on xxx" for path “.symlink_yyy” failed: FAILED

uroni · March 17, 2018, 1:15am

Question is if that is really the cause, or that it just has to link a lot of files/create a lot of symlinks in those directories. You should see that in e.g. iotop.

If you want to get rid of all those .symlink_xx paths either include the directory the symlinks are pointing to in the backup or disable following symlinks outside of the backed up path.

And the new version does properly reference single file symlinks. That could cause the backup delay…