Reliable Insights

A blog on monitoring, scale and operational Sanity

August 13, 2018

Why predeclare metrics?

The standard way to use metrics in Prometheus is to declare them at file level, before using them. Why?

Read more

August 6, 2018

Aggregating across batch job runs with push_time_seconds

For counting how many times a thing has happened you can use a counter and rate(), but that doesn't work across batch jobs.


Read more

July 30, 2018

Prometheus: Up and Running is out

After many months of work, Prometheus: Up&Running is now available for purchase!

Read more

July 23, 2018

Absent Alerting for Scraped Metrics

In the previous post we looked at dealing with when all the targets for a job had disappeared. What if you wanted to alert on specific metrics from one target disappearing?

Read more

July 16, 2018

Absent Alerting for Jobs

Alerting on numbers being too big or small is easy with Prometheus. But what if the numbers go missing?

Read more

July 9, 2018

ICMP Pings with the Blackbox exporter

The Blackbox exporter can perform ICMP probes. Let's see how.

Read more

July 2, 2018

External URLs and path prefixes

In a previous post I looked at setting the external URL. What if the reverse proxy is sending a different path than the user is using?

Read more

June 25, 2018

Using external URLs and proxies with Prometheus

Sometimes users will not access Prometheus's UI directly, instead using another URL. How do you make this work?

Read more

June 18, 2018

Shutting down Prometheus

Shutting down Prometheus properly helps reduce the risk of delays at startup. So how do you do that?

Read more

June 11, 2018

New Features in Prometheus 2.3.0

Prometheus 2.3.0 is now out, following on from 2.2.0 back in March with several fixes and improvements.

Read more


Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo