Having to reconstruct how far a failed cron job had gotten and what exact parameters it was run with can be error prone and time consuming. There is a better way.
A blog on monitoring, scale and operational Sanity
Alerting is an art. One must be sure to alert just enough to be aware of all problems arising in the monitored system while at the same time not drown out the signal with excess noise. In this blogpost we'll explain some of the best practices to use when alerting with Prometheus.
So you have just discovered Prometheus and want to try it out or use it to replace your old monitoring system but have run into a part of your stack that you cannot instrument with a client library and for which there are no officially supported exporters. What do you do?