Unproper cleaning of .directory pool


#1

Hello

So i had a client which was badly configured at the begining and was backing up too muc
Since a few month i added an exclusion for about 1.4TB of data. And was expecting to regain that room as the time passed on and backups expired.
This went not fast enought so i removed the oldest backup and today all the backups for that client.
So now the client has 0 backups but the directory pool still use 1.4TB.

Can i just delete all the files for that client folder, would that mess up inter client deduplication?
Do you need to some data to help fix this issue?
I did run remove-unknow and cleanup -a 1G a few times , didnt helped.

Server is stable so 2.1.20


#2

Would be great if you do some analysis to find out what is wrong and why remove-unknown is not fixing it.

  1. Find out the id of the client (SELECT id, name FROM clients on sqlite3 backup_server.db)
  2. Attach the active directory links of that clients (SELECT * FROM directory_links WHERE clientid=CLIENTID on sqlite3 backup_server_links.db)
  3. Attach list of directory pool directories (e.g. find /media/backup/CLIENT/.directory_pool -maxdepth 2 -name '???*')

The directory pool is per client, so it can be deleted without problems if all the backups of that client are already deleted.


#3

While i thin about it , that s a virual client, if that changes anything.
Running the queries now…


#4

1T4 server is ct01.dev.sss.com[gitlab]

sqlite> .open backup_server.db
sqlite> SELECT id, name FROM clients;
2|ct01.dev.sss.com
3|ct02.dev.sss.com
6|backup01.aaa.net
7|backup02.aaa.net
9|ct01.dev.sss.com[gitlab]
10|ct01.dev.sss.com[jenkins]
11|ct03.dev.sss.com

sqlite> .open backup_server_links.db
sqlite> SELECT distinct clientid FROM directory_links ;
2
3
10
11

find ./.directory_pool -maxdepth 2 -name ‘???*’|wc -l
3766
exemple:
./.directory_pool/Hx/Hx3sZBTcXW150782175360583825
./.directory_pool/0d/0dxCMbkX0O1500329768472709296
./.directory_pool/0d/0d0Z1fRcOA1497391690400391529
./.directory_pool/7c/7cIwrRloXP15027913081220996331
./.directory_pool/7c/7cUqNd8tOB1497391690400392049
./.directory_pool/EB/EBBCA3qwg5150782175760587766
./.directory_pool/Q8/Q8hySlUuQp15064885101934847240
./.directory_pool/Q8/Q8ap53TDbm1509275270266529665
./.directory_pool/Q8/Q8APuYZVyp1507957350196180056
./.directory_pool/Q8/Q8iwbxyjAd150782175360583608
./.directory_pool/Q8/Q8CBE1CJ0i1500761093904033509
./.directory_pool/YT/YTw6mAHOB515065929612039298129
./.directory_pool/Si/SiUJgNgpmp150700759632560896
./.directory_pool/Si/SiWp7npG7G15040695802499268298
./.directory_pool/52/52HmlX3nZp15042422522671939935
./.directory_pool/52/52CSoVXDnf15040695612499249361
./.directory_pool/LA/LALIpTZhqV150700759632561336
./.directory_pool/KF/KFDMWyU8yd1505211143657480491

ls -1ad .//.director
./ct01.dev.sss.com/.directory_pool
’./ct01.dev.sss.com[gitlab]/.directory_pool’
’./ct01.dev.sss.com[jenkins]/.directory_pool’
./ct02.dev.sss.com/.directory_pool
./ct03.dev.sss.com/.directory_pool

ls -1ad ./*/curr
./backup02.aaa.net/current
./ct01.dev.sss.com/current
./ct02.dev.sss.com/current
./ct03.dev.sss.com/current

So it looks like only this server is buggy


#5

Looking at a client where backups where not deleted

ct01.dev.sss.com[jenkins]/.directory_pool> find ./ -maxdepth 2 -name ‘???*’|wc -l
27797

sqlite> SELECT count (*) FROM directory_links WHERE clientid=10 ;
85521

exemples
sqlite> 3040766|10|F0wkNlQYH11507612206637171460|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/09
3040767|10|DfRVBSPVns1507735051760015731|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/0c
3040768|10|JCceA17byD1507735052760016730|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/10
3040769|10|vxnVrZAUsH150780345942289952|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/11
3040770|10|5GvUQHJU03150780346142291001|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/12
3040771|10|tSkLi1YfFM1508475871379729058|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/13
3040772|10|XiTKIKHp9K1508858367762224540|bbb/ct01.dev.sss.com[jenkins]/171029-0700/.hashes/bbb/.git/objects/14


#6

11am
@uroni do you need more infos or can i delete the folder?

2pm
deleting the folder, keeping the db.


#7

Manually deleted a few backups from another virtual client then run with remove-unknow.
As i always get this message when i run remove unknown i wasn t paying too much attention. But maybe the link count get unchanged , thus the real data folder is never actually deleted.

It s actually about1500 lines not just the 4 here that are for example
2018-01-15 15:26:43: WARNING: Directory link “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/171030-1710/.hashes/jenkins.dev.sss.com/data/jenkins/workspace/build/resalys/01_create_version/7.8/.svn/pristine/80” with pool path “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/.directory_pool/Bn/BnYQYGu1w51509448359439618489” not found in database. Deleting symlink only.
2018-01-15 15:26:45: WARNING: Directory link “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/171030-1710/.hashes/jenkins.dev.sss.com/data/jenkins/workspace/build/resalys/01_create_version/7.8/.svn/pristine/4c” with pool path “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/.directory_pool/Si/SizJxIPd8x1509448312439570756” not found in database. Deleting symlink only.
2018-01-15 15:26:45: WARNING: Directory link “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/171030-1710/.hashes/jenkins.dev.sss.com/data/jenkins/workspace/build/resalys/01_create_version/7.8/.svn/pristine/ea” with pool path “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/.directory_pool/E8/E8HlDulQuD1509448454439713035” not found in database. Deleting symlink only.
2018-01-15 15:26:45: WARNING: Directory link “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/171030-1710/.hashes/jenkins.dev.sss.com/data/jenkins/workspace/build/resalys/01_create_version/7.8/.svn/pristine/43” with pool path “/var/docker/data/urbackup-server/datas//ct01.dev.sss.com[jenkins]/.directory_pool/P3/P3qblaZjA51509448304439563201” not found in database. Deleting symlink only.


#8

Did you restore a backup of the UrBackup database at some point btw?

By manual deletion you mean deletion from the web interface? (not simply deleting the folder)

Warning can also be caused by a interrupted delete (i.e. power failure) and then be expected behaviour…


#9

Did you restore a backup of the UrBackup database at some point btw?

  • no

By manual deletion you mean deletion from the web interface? (not simply deleting the folder)

  • Had to delete the remaining folder .directory_pool today, all the backups were already deleted, only the client configuration was left, backups were deleted via the ui/

Warning can also be caused by a interrupted delete (i.e. power failure) and then be expected behaviour.

  • It wouldn’t surprise me that the cleanup operation was already interrupted or server shutdown too fast in the past.
    But then I am > 90% sure that the last sequence of : stop, cleanup , start , delete backup , stop , cleanup, give you the logs was done without interuption. Could that be the source of orphan .directory_pool ?

#10

Hello @orogor @uroni .

i’m having the same kind of issue :confused: . recently i change server machine and transferred everything into new server machine.

all clients can see new server and do backup properly. but after changing new server, suddenly clients start taking more space about 25% more compare to previously (i think) as drive fill up.

Some of clients shows only one last backup 10GB (for example) and when i go to .directory pool for that client it shows 50GB (for example). but when i go to web control panel for that client it shows all history and only one last backup (which i can delete from web control panel).

i tried couple of times ruining UNKNOW_REMOVE but doesn’t seem data size reduce.

my question is how can I reduce backup size.?
is it okay if i remove .directory pool folder manually.?

IT’s windows Environment.
server 20.1.20
clients 20.1.17

Many thanks :slight_smile:


#11

HI

If you have a single backup do not delete it … yet
How large do you estimate a single backup should be (on linux ncdu will help, on windows windirstat)?
How large is the full client folder ? (the folder named one level above the .directory pool named as a the client name)

As you have a single backup, both should have similar size, urbackup size may be smaller in case of duplicates files, but not bigger. (well actually if you have alot of small files, because of the .hash, it may be bigger, but you d need to really have a lot of files)
If the urbackup folder is bigger, then i guess you also have some kind of cleaning issue.


#12

Thanks for your reply :slight_smile:

  1. Single full file backup should be around 25GB
  2. the whole client folder is about 250GB

yea looks like there is some kind of cleaning issue, after changing to new machine i found it this issue.

Any suggestion what can i do to resolve this issue ?

many thanks


#13

It is improbable that this affects so many of them. But then I have seen ZFS volumes that have like an hour of (unsynced) updates.

Best would be a series of steps to reproduce the problem. I must admit that I don’t use the directory link method at all currently (instead btrfs).
I’ll do some tests with remove_unknown.

So this is a docker volume, but it has the same path in the docker instance as outside of it? Where do you run urbackupsrv (w/wo remove-unknown)?


#14

Hi

From inside the docker
No, the paths from inside/outside are différents.
These are the paths from inside the urbackup server docker.

We had some issues with btrfs, hence why we arent running it. But i guess we will try again soon.
We try to use btrfs about one time per year, but we always encounter some issue, sometime not from btrfs team fault , but it s mostly btrfs which is affected.
Hence why we use a lot of zfs/ext4, with which we never had problems.


#15

Hi Martin,

Any suggestion would you give me to clear or clean .directory pool.? it takes so much amount of space. i tried exploring all post regarding .directory pool. couldn’t find any answer for it. :frowning:

Many thanks


#16

Hi

Yes this is a “new” issue, as in it has been spotted in this forum post, so investigation is needed.
One of the thing that is needed is the smallest test case than can create this issue. then i guess the fix will be created.

Maybe it happens when backup are deleted manually, or when they are expired, or only with virtual clients,.
Or maybe with more special conditions, like if the server crash at some time in the processing or if the server runs out of space and didn t finish a previous backup.
Or maybe it comes from the folders having specific properties like being hard/softlinked.
Or a combination.

If i have some faith after the office day, i ll try to write a script to produce all theses cases in a vm.
If you have some scripting skills you can also try to so so.


#17

Testing shows that remove_unknown does not remove directory links in .directory_pool with no refrences in the database. This can happen if you e.g. use a database backup while having new backups in backup storage. This will be fixed.

Unresolved is why those unreferenced directories exist after normal operation (if they do – obviously they should not). They should be removed during normal file backup removal.


#18

Thanks for the fix.
Maybe that’s because the server was rebooted during a cleanup.


#19

2.2.6 has this cleanup (in remove unknown) now: Server 2.2.6 beta

If you can, please snapshot the backup storage and check if it corrupt any backups (e.g. with urbackupsrv verify-hashes) and revert db and backup storage if it corrupts backups.


#20

remove unknow runned for about 10hours.
it removed a lot of directory pools, maybe 1000-10000

running urbackupsrv verify-hashes
it runs since about 10 hours and is at 15%, it s running io maxed at about 20% iowait, at maybe 50mb/s.
Saved backup should be like 150-200GB
That s maybe not significant as this server is very slow in général.

Verify hash is showing errors like

2018-02-03 21:20:16: ERROR: Error opening file “/data/urbackup2/pascalou[home]/170911-0850/home/orogor/.config/google-chrome/Default/Pepper Data/Shockwave Flash/WritableRoot/#SharedObjects/L98PDDXW/macromedia.com/support/flashplayer/sys/#www.turbo.fr/settings.sol

It s not there :
sudo ls -l "/data/urbackup2/pascalou[home]/170911-0850/home/orogor/.config/google-chrome/Default/Pepper Data/Shockwave Flash/WritableRoot/#SharedObjects/L98PDDXW/macromedia.com"
ls: cannot access ‘/data/urbackup2/pascalou[home]/170911-0850/home/orogor/.config/google-chrome/Default/Pepper Data/Shockwave Flash/WritableRoot/#SharedObjects/L98PDDXW/macromedia.com’: No such file or directory

But the whole backup isn’t missing.
sudo ls -ld "/data/urbackup2/pascalou[home]/170911-0850/home/orogor/.config/google-chrome/Default"
drwxr-x— 23 urbackup urbackup 79 sept. 15 20:20 /data/urbackup2/pascalou[home]/170911-0850/home/orogor/.config/google-chrome/Default

Forgot to get more serious about this and run it in a screen and redirect output. Could these be run from the gui from the advanced tab or so and show a progress bar or just add a progress bar for remove-unknown?