We are gearing up to backup ~4500 windows workstations.
- How much load can the marketplace appliance take? Default for concurrent backups is 100. Can that be upped to 500? 1,000? 4,000?
- If I up the concurrent backups to 4,000, is there any harm in making incremental backups run every hour to have many smaller backups happen throughout the day?
- Can the product support some type of load balancing, ie. multiple instances where clients try another server but all instances share a S3 bucket?
- For disaster recovery, I assume I can just take weekly/monthly snapshots of the instance and restore as needed to another zone then run the cleanup commands to remove unknown?
- Any guidance on setting up S3 storage rules for migrating data to cheaper/slower S3 storage?
It largely depends on the what amount of data is backed up per client, how often/much that data changes and how often it is backed up.
Unfortunately, if you have one client with a good NVMe SSD with a large amount of changes (e.g. a database), that alone can saturate your backup server, since it has to replay all those changes.
There is a large IOPS overhead for backup creation+removal, plus if you backup less frequently the chance that a data change has aged out (the client already deleted/replaced it with a newer version) increases => less IOPS over a long time frame. So less frequent backups allow you to backup more clients.
Each appliance uses a separate S3 bucket, you can move clients between appliances (via replication+move), but it is currently not done automatically.
It stores all data to S3. So for disaster recovery, start a new appliance and point it at the existing S3 bucket.
I think “Intelligent Tiering” is best, otherwise one needs to pay for switching between storage classes.
In general I’d say use a machine with a moderate amount of CPU+RAM but with a good, as large as possible, local NVMe SSD (e.g. 1 TB). This is because, from experience, storage IOPS will be the limiting factor on the amount of simultaneous backups.
Unfortunately this’ll be really expensive on AWS. Plus with an EBS cache you could experiment with the cache size, i.e., start with a small S3 EBS cache, then increase it till performance is acceptable.