If filename is too long, it is truncated, but the last character of new filename is corrupted (before hash)

tan · January 14, 2022, 7:10am

The bug appears in this configuration:
Client 2.4.11 on Windows 10 x64 (last updates installed)
Server 2.4.14 on Debian 9 (last updates installed)

Looks like some changes were made to the code earlier, but they still don’t work (I reported the bug a year ago).

Files with incorrect names are not visible by network access (if the backup folder shared using Samba).
In Midnight Commander, the filename displayed will contain a diamond with a question mark inside. The program hints that one UTF-8 character is corrupted.

Criticality of the bug: I think “high”. This makes it impossible to completely copy the archive over the network to another computer. Files with corrupted names will not be copied.
It will definitely be a very bad surprise.

Bearded_Blunder · January 17, 2022, 2:53am

Have you considered testing using Debian stable for the server?
I suspect there have been changes and improvements to SAMBA between oldoldstable (9) and stable (11).

Might be nothing to do with your problem, but worth testing.

tan · January 17, 2022, 6:07am

Haven’t tried the new stable version of Debian. In any case, the incorrect unicode character in the names of the backed up files comes from Urbackup. It is necessary to fight with the root cause, not the consequence.

tan · January 20, 2022, 12:51pm

One more note. When trying to copy a file with invalid unicode characters, the system reports that the file was not found.

(invalid or incomplete multibyte or wide character)

Bearded_Blunder · January 20, 2022, 8:24pm

What’s the output of locale on that server?
If the output doesn’t have a list of stuff ending .utf8 that could be the issue.

Since you’re on Stretch you could try the Debian UTF-8 migration wizard

Debian UTF-8 migration wizard

This wizard upgrades legacy system locales to their UTF-8 equivalent. It also informs users whenever files in their home directory still utilize legacy encodings.

Available in Stretch.

It’s solved the problem for people in the cases where I searched that specific error.

tan · January 21, 2022, 6:21am

$ locale
LANG=ru_RU.UTF-8
LANGUAGE=
LC_CTYPE=“ru_RU.UTF-8”
LC_NUMERIC=“ru_RU.UTF-8”
LC_TIME=“ru_RU.UTF-8”
LC_COLLATE=“ru_RU.UTF-8”
LC_MONETARY=“ru_RU.UTF-8”
LC_MESSAGES=“ru_RU.UTF-8”
LC_PAPER=“ru_RU.UTF-8”
LC_NAME=“ru_RU.UTF-8”
LC_ADDRESS=“ru_RU.UTF-8”
LC_TELEPHONE=“ru_RU.UTF-8”
LC_MEASUREMENT=“ru_RU.UTF-8”
LC_IDENTIFICATION=“ru_RU.UTF-8”
LC_ALL=

tan · January 21, 2022, 6:27am

I seem to have found the reason. In the code branch “dev”, the problem is fixed long ago:
line 1030 of file
https://github.com/uroni/urbackup_backend/blob/dev/urbackupserver/FileBackup.cpp.
Server source code version 2.4.14 does not contain this block of code (from download page urbackup.org).

UPDATE:
I compiled the program (2.4.14) with this fix manually added. This code doesn’t work. It looks like that’s why it was not added to the release…

tan · February 28, 2023, 1:01pm

Some time ago the problem with a broken character when shortening long filenames was solved. Now the problem appears again:

Such files become inaccessible both via SMB and locally from the folder where all backups are stored.
Shortening the names of long files during backup does not work correctly. And such files are often found, since the filenames typed in non-Latin characters take up two bytes! The allowed length of a filename in Linux is reduced by about half

tan · March 17, 2023, 6:07am

Can the error found be corrected? It interferes a lot when working with the backup directory via SAMBA. But this is the easiest way to find the desired file in the backup

tan · March 17, 2023, 9:15am

In the UrBackup Server web interface, the problem looks like this (invalid character with a question mark):

Of course, such a file is not copied either via SAMBA or locally via cp or Midnight Commander. You can download the file in the web interface, but its size will always be zero bytes.
Thus, we can say that this file is missing from the backup!

tan · March 22, 2023, 9:01am

Here is another example to get developers’ attention. I created a file with a name length of 256 bytes. Here is what happened in the UrBackup web interface:

And here is how the file name shortened by the server looks in the HEX editor:

Where did these three bytes come from, they were not in the original (there are only the letters Ö and the .txt extension)?

@uroni we need your help!