Restart of urbackup_srv fails to bind to 55414

Firstly, thanks for creating this software, I was very happy to discover and start using it! I would also like to help as much as I can as I really want to see this project develop.

So to begin with…

I installed the server on a Debian Squeeze amd64 server and after a restart of urbackup_srv the following is logged in /var/log/urbackup.log and the server fails to bind to the port 55414. This does not always happen so is random.

01/31/12 01:07:08: WARNING: Shutting down (Signal 15)
01/31/12 01:07:10: ERROR: HTTP: Failed binding SOCKET to Port 55414

If I stop and start urbackup_srv it seems fine.

I will take a look at the init script and report back if I discover anything and perhaps log it on your bug tracker.

I also did a Full image backup of a Windows 7 Machine and after it had completed I noticed the urbackup process on the server was hogging 100% of one CPU core. I left it for a few hours in case it was doing something but it stayed at 100% until I stopped and started urbackup_srv. Let me know what I can do to debug that.

Thanks.

Thanks for being interested.

I’ll look into that restart problem. It’s probably the port not being freed correctly.

With regard to the 100% CPU problem:

  1. If that happens again, can you look at: http://127.0.0.1:55414/x?a=server_status and tell me the server_status there
  2. Does it have 100% CPU usage during backup as well?
  3. Can you try 0.25.1. I optimized the VHD writing there

UrBackup first copies the image data to temporary files (/tmp/urbackup_tmp by default) and writes that to the VHD later. Thus it really may take a while after it says on the client it is finished. Was the image finished (listed in activities?)
How good is your CPU?

RE: 100% CPU. When the CPU was at 100% I could not access the web interface as it was unresponsive. When I stopped and started urbackup_srv I was able to log in and the following was listed in Activities so it appears to have completed ok.

4 3MMEL-TP01 Full image backup 2012-01-30 20:25 101 min 55.7 GB

When the backup was running it was using a medium to high amount of CPU (approx 70-100%), when I monitored it via htop I could see it fluctuating across both CPU cores. When the 100% core issue happened it was locked to one core and was 100% outright with not fluctuation or core swapping.

I am running 0.25.1-1. CPU is AMD Athlon™ 64 X2 Dual Core Processor 4800+

Another thing… When the full image backup was running I had a strange networking issue - I am not sure if this was due to my switch etc. Basically any existing network connections I had were fine and continued to work such as an open ssh connection to my Debian Server, Windows Explorer shares via samba to the server or Gmail inbox via Firefox. However if I tried to initiate an new connection (in Firefox) to a new website it would time out, likewise I could not access the web interface for urbackup. Once the backup completed everything was fine again.

I will run urbackup on another Windows 7 computer tonight and will use wireshark to look at the networking issue. How can I enable debugging on the urbackup server to have closer look there?

Thanks

That’s definetely not normal then.

You can change the loglevel to “debug” in /etc/default/urbackup_srv

It will log to /var/log/urbackup.log

That network issue might just be a saturated connection. I don’t really know what to do about that. I mean the operating system should keep one connection from using up more than its fair share… I might have tuned it too much though. That is certainly reversible.

Ok so for the port binding issue I can reproduce this now and it’s not limited to restart itself. The issue seems to be tcp connections that remain from computers connecting to the web interface after the daemon is stopped.

Steps to reproduce:

Go to the urbackup interface from any computer browser.
Restart the urbackup_srv “invoke-rc.d urbackup_srv restart” or stop and start.
urbackup_srv fails to bind to port 55414
Leave it it for a few seconds until the tcp connections from the client computer expire.
Start urbackup_srv and it binds to 55414.

See below output - this is long sorry as I had some unexpected errors when I went to create this thread and have included them.

I stop the server daemon and check no tcp sessions on 55414.

myriad> invoke-rc.d urbackup_srv stop
myriad> netstat -tap | grep 55414
myriad>

I then start it and connect to the web interface.

myriad> invoke-rc.d urbackup_srv start
myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10281/urbackup_srv
myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10281/urbackup_srv
tcp      592      0 myriad.happytreas:55414 10.39.41.107:51830      ESTABLISHED -

At this point the interface is unresponsive and the browser reports it is waiting…
I wait about 20 seconds and try to refresh the page and hit enter on the address and netstat shows this:

myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10281/urbackup_srv
tcp      593      0 myriad.happytreas:55414 10.39.41.107:51830      CLOSE_WAIT  -
tcp      592      0 myriad.happytreas:55414 10.39.41.107:51837      ESTABLISHED -
tcp      593      0 myriad.happytreas:55414 10.39.41.107:51836      CLOSE_WAIT  -

I try again a few seconds later…

myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10281/urbackup_srv
tcp      593      0 myriad.happytreas:55414 10.39.41.107:51830      CLOSE_WAIT  -
tcp      593      0 myriad.happytreas:55414 10.39.41.107:51837      CLOSE_WAIT  -
tcp      592      0 myriad.happytreas:55414 10.39.41.107:51853      ESTABLISHED -
tcp      593      0 myriad.happytreas:55414 10.39.41.107:51836      CLOSE_WAIT  -

I stop the daemon and check for tcp sessions

myriad> invoke-rc.d urbackup_srv stop
myriad> netstat -tap | grep 55414
myriad>

I start the daemon again and check that it’s listening:

myriad> invoke-rc.d urbackup_srv start
myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10352/urbackup_srv

I connect to the web interface about 20 seconds later and it works. netstat shows this:

myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10352/urbackup_srv
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51871      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51873      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51872      TIME_WAIT   -

I stop the daemon, check netstat, start the daemon and check netstat.

myriad> invoke-rc.d urbackup_srv stop
myriad> netstat -tap | grep 55414
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51871      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51873      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51872      TIME_WAIT   -
myriad> invoke-rc.d urbackup_srv start
myriad> netstat -tap | grep 55414
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51871      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51873      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:51872      TIME_WAIT   -

The server fails to bind on 55414 as you can see but the old sessions are still there. I wait about 30 seconds and then check netstat.

myriad> netstat -tap | grep 55414
myriad>

No tcp sessions on port 55414 so I start the daemon… OK at this point something unexpected happens the daemon made one CPU core run at 100% and even though there were no tcp sessions on 55414 and still failed to bind to the port.

the log output is as follows:

01/31/12 22:01:23: WARNING: Shutting down (Signal 15)
01/31/12 22:01:23: Exited Loop
01/31/12 22:01:23: Deleting at..
01/31/12 22:01:23: Deleting SelectThreads..
01/31/12 22:01:23: Deleting lbs...
01/31/12 22:01:23: Shutting down plugins...
01/31/12 22:01:23: Stopping channel...
01/31/12 22:01:23: Lost channel connection to client. has_error=true
01/31/12 22:01:24: server_prepare_hash Thread finished
01/31/12 22:01:24: server_hash Thread finished - exitnow
01/31/12 22:01:24: server_get Thread for client 3MMEL-TP01 finished
01/31/12 22:01:25: Deleting server...
01/31/12 22:01:25: deleting cached settings...
01/31/12 22:01:25: deleting stream services...
01/31/12 22:01:25: ServiceAcceptor finished
01/31/12 22:01:25: HTTP: Waking up...92438
01/31/12 22:01:25: ServiceWorker finished
01/31/12 22:01:25: c=0/2
01/31/12 22:01:25: c=1/2
01/31/12 22:01:25: c=2/2
01/31/12 22:01:25: deleting plugins...
01/31/12 22:01:25: Deleting threadsafe plugins...
01/31/12 22:01:25: Deleting pluginmanagers...
01/31/12 22:01:25: Deleting actions...
01/31/12 22:01:25: Deleting sessmgr...
01/31/12 22:01:25: removing sessions...
01/31/12 22:01:25: done.
01/31/12 22:01:25: waiting for sessionmgr...
01/31/12 22:01:25: done.
01/31/12 22:01:25: Shutting down ThreadPool...
01/31/12 22:01:25: destroying databases...
01/31/12 22:01:25: deleting database factories...
01/31/12 22:01:25: unloading dlls...
01/31/12 22:01:25: Destroying mutexes
01/31/12 22:01:31: Changing user...
01/31/12 22:01:31: done.
01/31/12 22:01:31: Loaded -pychart- plugin
01/31/12 22:01:31: Loaded -url- plugin
01/31/12 22:01:31: Loaded -download- plugin
01/31/12 22:01:31: Loaded -fsimageplugin- plugin
01/31/12 22:01:31: Starting HTTP-Server on port 55414
01/31/12 22:01:31: ERROR: HTTP: Failed binding SOCKET to Port 55414
01/31/12 22:01:31: Starting as server...
01/31/12 22:01:31: Started UrBackup...
01/31/12 22:01:32: Creating SOCKET...
01/31/12 22:01:32: done.
01/31/12 22:01:32: Server started up sucessfully!
01/31/12 22:01:32: Looking for old Sessions... 0 sessions
01/31/12 22:01:32: New Backupclient: 3MMEL-TP01
01/31/12 22:01:32: Getting client settings...
01/31/12 22:01:32: Sending backup incr intervall...

I stop the daemon which stops the CPU going mad and start the daemon again. This time its happy and is not thrashing the CPU and binds to 55414.

myriad> invoke-rc.d urbackup_srv stop
myriad> netstat -tap | grep 55414
myriad> netstat -tap | grep 55414
myriad> invoke-rc.d urbackup_srv start
myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10545/urbackup_srv

I connect to the web interface ok again.

I restart the daemon and it fails again:

myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10545/urbackup_srv
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52008      TIME_WAIT   -
myriad> netstat -tap | grep 55414
tcp        0      0 *:55414                 *:*                     LISTEN      10545/urbackup_srv
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52008      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52013      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52012      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52011      TIME_WAIT   -
myriad> invoke-rc.d urbackup_srv restart ; sleep 3 ; netstat -tap | grep 55414
Restarting UrBackup server deamon: urbackup_srv.
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52021      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52015      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52008      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52018      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52013      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52014      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52012      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52011      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52020      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52017      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52016      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52019      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52022      TIME_WAIT   -
myriad> netstat -tap | grep 55414
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52021      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52015      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52008      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52018      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52013      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52014      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52012      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52011      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52020      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52017      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52016      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52019      TIME_WAIT   -
tcp        0      0 myriad.happytreas:55414 10.39.41.107:52022      TIME_WAIT   -