Server rebuilding indexes : 10 days left, can't run backups, can i do anything?

orogor · July 8, 2020, 8:07am

Hello

I did defrag the database using urbackupsrv, that did take about a day.
Now it’s rebuilding from scratch the whole file index, it made about 10% in 24h.
I think after every defrag, it rebuild everything. But today, 10 days without backup is not very reasonable.

Is it possible to skip that part even if more files have to be re-transfered then resume it later ?
So i can do something like 1 day backup then 1 day reindex.

Can the database be split by date or client+common files (partitioned tables) so a live reindex can be envisioned?
My guess is that as it is , the file index is too big to be manageable. And somehow, there’s a need for splitting it, even at some cost like not allowing a few specific clients to backup for a few days. Or having these client retransfert the full files while the reindex is progressing.

orogor · July 8, 2020, 8:40am

I did just apply the recommendation for migration for urbackup v1 hoping it would help.

The idea to defrag the database comes from a long discussion with the btrfs peoples in irc, concluding that i was getting bad perfs because the database was completely fragmented.

The idea to split the database by something comes from using postgres/zabbix, where you get noticeably better perfs if you partition your data by date, then drop the data when the retention period is over. You just insert all the new data to a table named by date, then drop the table once it’s expired.

This avoid creating any bloat inside the database, as well as avoiding to search where the data is to delete it. But because it’s a partitioned table you simply need to access the main table whatever the date, it’s transparent. Not sure you can easily replicate this with sqlite.

uroni · July 8, 2020, 10:56am

Yeah, theoretically it could be partitioned by client. I have just never seen the performance bottleneck in this area (at least not one that couldn’t be solved by an SSD). Sorry, but the rebuild isn’t the normal use case…

Did the btrfs fragmentation cause a performance bottleneck?

I guess, deleting the index file during defrag might be bad… but rebuilding it was always fast for me when applying the upgrade recommendation (I run it with this recommendation by default btw. this might also increase the normal database perf with btrfs).

orogor · July 8, 2020, 11:35am

I think i always saw a full index rebuild after a urbackupsrv defrag-database. Only at the beginning it was a few hours, then a few days. Also that server isn’t very fast (boss didn’t allow flashing to IT mode), compared to other servers.

At least that’s the conclusion the btrfs dudes had, that with rotational disks (r410 so 4 disk and a need for 16TB usable so rotational needed), the index ended up being split in a few hundred thousands or a few million pieces. Then it is read by 4k blocks scattered all over the disk. So the read speed for that ends up being 3mb/sec for a 60gb file. I was supposed to run sqlite defrag via urbackup (done), then btrfs defrag (didn’t run it yet, might not be useful as the whole file is being rebuild).

Yes, a separate ssd for the db would help a lot here, unfortunately it’s not possible to fit an additional drive in this server.

Yes, using the migration settings seems to speed things a lot (5% in 4h, vs 10% in 24h).
But the commit every hours = . Even without it, i wonder of the integrity of btrfs if the server crashes. I need to see tomorrow if the speed is due to be the first hour of data being uncommitted or if it’s faster in average.

If you can partition by client, I think ideally :

Partition by client
Allow every maintenances operations to be done by client
Only forbid backup for a specific client during its maintenance, and keep the rest of urbackup running.
Allow to schedule a maintenance every x backup or every x days.

Anyhow, thanks for your work.