Reliable Insights

A blog on monitoring, scale and operational Sanity

December 7, 2020

Choosing your pushgateway grouping key

What does and doesn't make a good grouping key?

Read more

August 6, 2018

Aggregating across batch job runs with push_time_seconds

For counting how many times a thing has happened you can use a counter and rate(), but that doesn't work across batch jobs.


Read more

February 19, 2018

Common pitfalls when using the Pushgateway

Jobs of an ephemeral nature are often not around long enough to have their metrics scraped by Prometheus. In order to remedy this the Pushgateway was developed to allow for these types of jobs to push their metrics to a metrics cache in order to be scraped by Prometheus long after the original jobs have gone away. This blogpost discusses some of the common pitfalls users tend to fall into when adding the Pushgateway to their monitoring stack.

Read more

October 9, 2015

Monitoring Batch Jobs in Python

Prometheus monitoring is usually against on long-lived daemons, but what if you've a batch job that you want to monitor?

Read more


Blog   |   Training   |   Book   |   Privacy