Handling big data amount with Urbackup?

rlopez · December 31, 2019, 5:35pm

I am looking for a solution to backup off-site a big amount of data. I’m talking about 20TB. The source are primarily thousands and thousands of small files of 10-20Mb and hosted by only one Windows client.
One of my first options is use Amazon S3 as storage. I am looking also others like Backblaze which have better prices.
The question is if Urbackup could handle this data amount in a reliable and safe way and which could be the best approximation to accomplish this task.
Thank you so much.

uroni · January 2, 2020, 12:14pm

Yes, UrBackup can handle this. The question is if the backup storage can handle it (usually the backup storage IOPS are the bottleneck).

See also this recent discussion: UrBackup compatible with Wasabi (S3 compatible endpoint)

rlopez · January 2, 2020, 5:03pm

If I understood it correctly, I could do this with the Urbackup appliance (Infscape). The appliance acts as a gateway/caché to Amazon S3 (or compatible) cloud.
The data first is stored into the appliance disks and then (once a day by default) this data is uploaded to the cloud storage?

The operative with backups is similar to standalone Urbackup?
I mean, I need to make full and incremental backups?
In S3 are stored different snapshots (or versions) of the files?
The restore process is handled enterely by Urbackup?
For backup operative the costs associated to S3 to retrieve and put data are elevated?
Are the data encrypted in S3 by UrBackup or I can use native S3 encryption?
Can I monitor the sync status between UrBackup appliance and S3?
How handles possible “out of data” in local storage?
The local DB in client and in Urbackup Server for file hashing will be very big for 20 TB of data, there is no problem with sqlite to manage this? I have been problems with local Urbackup servers in some cases with DB corruption with much less data.
Thank you.

uroni · January 2, 2020, 7:41pm

Yeah, to local cache, then S3 (or compatible)/Azure. Amazon has its own gateway appliance for that as well. There is a thread/report about here in the forums. But it is relatively expensive.
If you compare AWS S3 (or GCP/Azure) to something else, keep in mind that S3 stores to two different data centers per default with fast access times. B2 would be more comparable e.g. to S3 One Zone – Infrequent Access.

Yes.

It stores the whole backup storage file system on S3, including all backups and a database backup for recovery from a broken system disk.

Can you be more specific? Restore of a client (that would work like UrBackup) or the appliance (e.g. in case the system disk fails)?

It tries to minimize that cost. In general the larger the local cache, the less data it has to retrieve during normal operation and if you wait for upload to the cloud (e.g. the 24h) plus have a large enough cache it might be able to bundle up multiple backups before upload.

It’s encrypted before upload. In my opinion the gain native S3 encryption gives is pretty limited (after all Amazon can still decrypt it).

If you want details see “Settings → System → Access server statistics (netdata)” then go to storage → clouddrive. It shows basic progress information on the progress screen.

Local storage is only used as cache, so its size doesn’t really matter beyond performance (and the already mentioned retrieve/put cost trade-off).

SQLite is one of the most widely used and well tested software components out there. So I’d guess the problem lies outside of sqlite (e.g. storage corruption or non-ECC RAM). As long as the system disk/dbs are on a SSD/NVMe I also haven’t seen performance issues, though it depends a bit on the workload (many small files and full backups would be more problematic). The appliance puts the db on btrfs. That costs a bit performance, but that way corruption (caused by the SSD) can be detected and db backup can be restored from backup storage (which it runs automatically every time it commits to backup storage).

orogor · January 3, 2020, 8:50am

Hello

You way want to try some image based backups if you really have a lot of files (mail server may need that for example,when you have like 500GB of 1kb files)

With urbackup if yo go the file level way; you may want to try virtual clients, that is if you have 15GB of archive data , rarely modified and 5GB of current data, set up 2 différent virtual clients with différents backup frequencies (or even 4-5 virtual clients if that make sense,like : doc, photo, video, whatever).
Backup resume, completion and whatnot would be at the virtual client level and that helps.

Don_Wright · January 3, 2020, 11:32pm

Some great suggestions from Orogor for managing your large backups. I’ll just add that with UrBackup’s ability to mount image backups as virtual disk volumes it is possible to locate and restore files individually, not just an entire image. I’ve done this when I needed a file from the Program Files directory that wasn’t included in the regular C:\Users daily backups. It may not be as straightforward as the built-in file restore and image restore tools of UrBackup, but you don’t give up access to individual files.

rlopez · January 4, 2020, 6:42pm

I’m going to try virtual clients for file backups. I don’t know if image backups may be a good choice because the disk size in origin will be very big…the source volume could be 20TB…
The 95% of the data will not be modified ever, so I don’t need versioning of these files, only one copy of each one. My primarily doubt is, if I can have a “inctemental forever” backup scheme because the problem of the time that supose upload 15-20 Tb of data to cloud when full backup is made with a standard fiber line.

RoBoSK · January 4, 2020, 6:52pm

for info: “virtual clients” does not work on ZFS storage…