Remove-unknown seems to delete the entire database

Hi,

I wanted to stop a full file backup that kept resuming, so I stopped the server and deleted the partial backup from the files dataset and ran urbackupsrv remove-unknown -u root (I’m using zfs).
Looking through the backup directory, I found old datasets (way over the max file backups) in one of the clients, so i ran remove-unknown a second time. The server began deleting all backups of all clients, which i canceled.
I restored the server database from last night’s backup, started the server and all the backups (that were not deleted) correctly show up.

So it seems to me that the first run of remove-unknown for some reason deleted all the database entries, so the second run started to delete all the files and images.
Is there a way to do the opposite of remove-unknown, filling a fresh database with what is in the directories? I suppose my file index is broken.

root@backup:~# urbackupsrv --version
UrBackup Server v2.3.7.0
[...]
root@backup:~# uname -a
Linux backup 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
root@backup:~# cat /etc/urbackup/*
/tank/urbackup/backupfolder
tank/urbackup/images
tank/urbackup/files

Sorry about that. Unfortunately there is currently no way to fill the database from backup storage.

It probably has something to do with the ZFS setup. The remove-unknown logs would be interesting.

I manually made log of the second run where everything was getting deleted. Since the remove-unknown doesn’t log to /var/log/urbackup (maybe something you can fix?) I don’t have a log of the first run.
I boiled it down from about 1.5 million lines. Looks to me like everything works ‘as designed’…

2019-02-22 14:58:19: Going to remove all unknown files and directories in the urbackup storage directory. Waiting 20 seconds...
2019-02-22 14:58:39: Shutting down all database instances...
2019-02-22 14:58:39: Opening urbackup server database...
2019-02-22 14:58:39: SQLite: recovered 3054 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2019-02-22 14:58:39: SQLite: recovered 195841 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2019-02-22 14:58:39: SQLite: recovered 2721 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2019-02-22 14:58:39: SQLite: recovered 31004 frames from WAL file /var/urbackup/backup_server_links.db-wal code: 283
2019-02-22 14:58:39: SQLite: recovered 37 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:39: SQLite: recovered 3054 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2019-02-22 14:58:39: SQLite: recovered 37 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:40: SQLite: recovered 195841 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2019-02-22 14:58:40: Testing if backup destination can handle subvolumes and snapshots...
Testing for btrfs...
TEST FAILED: Creating test btrfs subvolume failed
Testing for zfs...
ZFS TEST OK
2019-02-22 14:58:42: Backup destination does handle subvolumes and snapshots. Snapshots enabled for image and file backups.
2019-02-22 14:58:42: Emulating reflinks via copying
2019-02-22 14:58:42: Cleaning up 0 percent
2019-02-22 14:58:42: Cleaning up 0 bytes on backup storage
2019-02-22 14:58:42: Database cache size is 1000 MB
2019-02-22 14:58:42: Starting cleanup...
2019-02-22 14:58:42: Freeing database connections...
2019-02-22 14:58:42: SQLite: recovered 3054 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2019-02-22 14:58:42: SQLite: recovered 39 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:42: SQLite: recovered 195841 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2019-02-22 14:58:42: Enough free space now.
2019-02-22 14:58:42: Enough free space now.
2019-02-22 14:58:42: Client with id=5 has 2 full image backups max=5
2019-02-22 14:58:42: Client with id=5 has 6 incremental image backups max=30
2019-02-22 14:58:42: Client with id=21 has 2 full image backups max=5
2019-02-22 14:58:42: Client with id=21 has 6 incremental image backups max=30
[...]
2019-02-22 14:58:42: Deleting old logs...
2019-02-22 14:58:42: Done deleting old logs
2019-02-22 14:58:42: Cleaning history...
2019-02-22 14:58:42: Rewriting daily history...
2019-02-22 14:58:42: Reading history...
2019-02-22 14:58:42: 1096 history items read
2019-02-22 14:58:42: Deleting history...
2019-02-22 14:58:42: Writing history...
2019-02-22 14:58:42: SQLite: recovered 39 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:42: Rewriting monthly history...
2019-02-22 14:58:42: Reading history...
2019-02-22 14:58:42: 198 history items read
2019-02-22 14:58:42: Deleting history...
2019-02-22 14:58:42: Writing history...
2019-02-22 14:58:42: SQLite: recovered 39 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:42: Rewriting yearly history...
2019-02-22 14:58:42: Reading history...
2019-02-22 14:58:42: 0 history items read
2019-02-22 14:58:42: Deleting history...
2019-02-22 14:58:42: Writing history...
2019-02-22 14:58:42: SQLite: recovered 39 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:42: Done cleaning history
2019-02-22 14:58:42: Cleaning deleted backups history...
2019-02-22 14:58:42: Done cleaning deleted backups history.
2019-02-22 14:58:42: Cleaning up client lists...
2019-02-22 14:58:42: Done cleaning up client lists.
2019-02-22 14:58:42: Updating statistics...
2019-02-22 14:58:43: Updating image stats...

2019-02-22 14:58:43: Repairing image failed
2019-02-22 14:58:43: ERROR: Error opening file '/tank/urbackup/backupfolder/**censored**/180927-2009_Image_SYSVOL/Image_SYSVOL_180927-2009.raw'
[one of these for each existing image backup]

2019-02-22 14:58:43: Updating file statistics...
2019-02-22 14:58:43: Updating file statistics: 53%
2019-02-22 14:58:43: Done updating statistics.
2019-02-22 14:58:43: Enough free space now.
2019-02-22 14:58:43: Cleanup successfull.
2019-02-22 14:58:43: SQLite: recovered 3187 frames from WAL file /var/urbackup/backup_server.db-wal code: 283
2019-02-22 14:58:43: SQLite: recovered 39 frames from WAL file /var/urbackup/backup_server_settings.db-wal code: 283
2019-02-22 14:58:43: SQLite: recovered 195853 frames from WAL file /var/urbackup/backup_server_files.db-wal code: 283
2019-02-22 14:58:43: SQLite: recovered 2721 frames from WAL file /var/urbackup/backup_server_link_journal.db-wal code: 283
2019-02-22 14:58:43: SQLite: recovered 31004 frames from WAL file /var/urbackup/backup_server_links.db-wal code: 283
2019-02-22 14:58:43: Removing unknown for client "**censored**"

2019-02-22 14:58:43: WARNING: Path for file backup [id=5911 path=190207-0610 clientname=**censored**] does not exist. Deleting it from the database.
[one of these for each file backup]

2019-02-22 14:58:43: WARNING: File entry with id 255088402 with filesize=428725962 hash=HrOPd8S8rhyWExK9pBSv0A== not found in entry index while deleting, but should be there. The file entry index may be damaged.
[about 1.5 million lines of this]

Edit: I forgot to mention, because of “The file entry index may be damaged” I ran repair-database (which looked like it found no problems) and tried again, but got the same behavior by remove-unknown.

@uroni I need help recovering from this. A lot of image backups were deleted that are still present in the database and we can’t get rid of them. That prevents us from creating new image backups, even full ones.
Starting a “Full image backup #1” fails with this:

Could not create snapshot of previous image backup at 190220-1902_Image_C "cannot open 'tank/urbackup/images/[clientname]/190220-1902_Image_C@ro': dataset does not exist"

The server thinks there is a volume 190220-1902_Image_C, but there isn’t. Marking all the missing volumes for deletion and running cleanup looks like this:

2019-03-14 10:29:22: Deleting image file "/tank/urbackup/backupfolder/[clientname]/190220-1902_Image_C/Image_C_190220-1902.raw" because it was manually set to be deleted...
2019-03-14 10:29:22: Deleting image backup ( id=1184, path=/tank/urbackup/backupfolder/[clientname]/190220-1902_Image_SYSVOL/Image_SYSVOL_190220-1902.raw ) ...
Destroying subvol tank/urbackup/images/[clientname]/190220-1902_Image_SYSVOL failed. Promoting dependencies...
Searching for origin tank/urbackup/images/[clientname]/190220-1902_Image_SYSVOL@190220-1902_Image_SYSVOL
2019-03-14 10:29:22: Deleting image backup failed.
2019-03-14 10:29:22: Deleting image backup ( id=1185, path=/tank/urbackup/backupfolder/[clientname]/190220-1902_Image_C/Image_C_190220-1902.raw ) ...
Destroying subvol tank/urbackup/images/[clientname]/190220-1902_Image_C failed. Promoting dependencies...
Searching for origin tank/urbackup/images/[clientname]/190220-1902_Image_C@190220-1902_Image_C
2019-03-14 10:29:23: Deleting image backup failed.
2019-03-14 10:29:23: WARNING: Deleting image "/tank/urbackup/backupfolder/[clientname]/190220-1902_Image_C/Image_C_190220-1902.raw" (manually set to be deleted) failed.

IMHO if a full backup can’t be based on a zfs snapshot of the previous one, the snapshot chain should be restarted. Essentially the same logic as when you create the first image backup.
Do you think this is salvageable or should we throw away the database and data and start from scratch?

If you run UPDATE backup_images SET version=100 on backup_server.db (with sqlite3) it restarts all backup chains.

So the problem is that if you run remove-unknown now it would delete database entries for image backups that exist in addition to removing the database entries for image backup that are missing?

Yes I think so. I’m going to try the version bump. Is 100 a magic number in code or does it just have to be higher than the current number? How will this affect future updates of the server?
Edit: I was hoping that at some point all the affected datasets and volumes will be outside our retention window of 3 months and the problem will solve itself. Right now everything works except for the image backups of 3 specific clients that had all their image backups deleted by remove-unknown.

I manually dropped all the missing datasets from the db and got image backups running that way.