Internet backups from containers on the same machine stop working after host reboot

Scion · May 23, 2021, 11:25am

So, I have UrBackup v2.4.13 from this docker image running on a Linux virtual machine.
Alongside this UrBackup image, on the same machine, I have many other docker-compose stacks of apps running. Each of these stacks contains this docker image which then backs up the important volumes associated with each app. These clients would then perform internet backups (I had the containers on separate networks) to the main UrBackup Server.

On initial setup, everything worked perfectly. (I had to not expose ports 35621-35623 for each of the clients, which disables restores, but that didn’t seem to affect backups at all)
I was able to rebuild both the clients and the server and they would reconnect like they were supposed to. This has been working very well fo me so far.

However I recently had to restart my virtual machine and now, no matter what order I restart the clients/server, and no matter how long I wait, the clients simply fail to connect to the server unless I remake them (and lose all my backup data).

This is the log from the clients:

tail: warning: --retry only effective for the initial open
tail: cannot open ‘/var/log/urbackupclient.log’ for reading: No such file or directory
2021-05-23 10:22:18: urbackupserver: Server started up successfully!
2021-05-23 10:22:18: Started UrBackupClient Backend…
2021-05-23 10:22:18: ERROR: Error joining ipv6 multicast group ff12::f894:d:dd00:ef91
2021-05-23 10:22:18: FileSrv: Servername: -cd1ad5559b3c-tail: ‘/var/log/urbackupclient.log’ has appeared; following new file
2021-05-23 10:22:19: Looking for old Sessions… 2 sessions
2021-05-23 10:22:28: ERROR: Error joining ipv6 multicast group ff12::f894:d:dd00:ef91
2021-05-23 10:22:28: FileSrv: Servername: -verdaccio-
2021-05-23 10:22:28: Final path: /backup
2021-05-23 10:22:31: Error receiving challenge packet
2021-05-23 10:22:31: InternetClient: Had an auth error
2021-05-23 10:22:41: Error receiving challenge packet
2021-05-23 10:22:41: InternetClient: Had an auth error

And then it pretty much repeats the auth and challenge error every once in a while.
The ‘multicast group’ errors always occur and I don’t think are related.
The server, meanwhile, seems to start up fine. However, it will give these two errors (it doesn’t repeat them):

2021-05-23 10:56:24: Authentication failed in InternetServiceConnector::ReceivePackets: Token not found
2021-05-23 10:56:24: Authentication failed in InternetServiceConnector::ReceivePackets: Token not found

I have checked and double checked that the tokens are the exact same as the tokens listed in the settings, but the backups still fail to connect. This is NOT an issue with backup clients on separate machines. Those clients reconnect just fine.

If you could help me figure out how to reconnect my clients in a situation like this, I would be greatly appreciative. This backup pipeline with this app is absolutely fantastic, but this bug is frustrating.

Scion · May 26, 2021, 1:32am

For now, I guess I’ll just recreate the backups and hope that this gets fixed in a future version. Some advice on what information I could have provided to better debug this issue would have been appreciated though.

Scion · May 27, 2021, 1:58pm

The issue likely had to do with a messed-up network configuration of some sort on the VM itself, or something, as new containers stopped working as well. I ended up restoring from a backup back to before they stopped working.

Robert · May 27, 2021, 8:33pm

Hi,
I would check your linux VM firewall settings. For docker containers talking to each other, if they are connected to separate user defined bridges, you need IP forwarding enabled. Maybe you did not set permanent flag in your policy and it stopped working after reboot?
regards
Robert

Scion · May 27, 2021, 9:44pm

I was able to replicate the issue, it seems that this issue occurs when re-creating the container from scratch.
I store the following directories in persistent volumes:
/var/urbackup
/var/log
/backup (where the backups are stored)

I there another directory I should be keeping as persistent?

Edit: Scratch that, they came back online after the recreation. It just took a while.
I’m not sure what was causing my issue back then.