Creating a more refined alert about missed backup interval

BrainWaveCC · March 31, 2021, 10:57pm

The current, default alert for warning that backups are not running, treats full and interval backups alike, for the most part.

I have an environment where large file backups (>500GB) are occurring over the WAN, and so full backups can exceed the default interval for alerting.

Incremental backups don’t generate an error, as they can complete in 5-20 minutes most times. But full backups across the WAN can take anywhere from 8 hours to 28 hours for specific systems. With backups scheduled every 4 hours, and the default of 3 missed intervals generating an alert email, some servers manage to generate one each time they come to execute their full backup.

It would be nice to not have an error generated if a backup is actually in progress when the interval criteria is met. Coupled with this, it would be great to have a separate warning that could be triggered if a backup JOB was taking too long.

I could increase the number of intervals, but this would result in a gap in visibility for normal scenarios that would more likely bite me. (e.g. changing the interval to 8, so that 32 hours could pass before an alert, but this would mean up to a day and a half of failed incremental backups in a non-full-backup situation, which is too much.)

Any suggestions on ways that I can refine the alerting? I looked at customizing the alerts, but without a way to ignore while a full backup job is running, it doesn’t look promising.

uroni · April 1, 2021, 5:19pm

Sounds like a good thing to fix. Will look into it.

BrainWaveCC · April 2, 2021, 2:47pm

Thanks @uroni