Reliable Insights

A blog on monitoring, scale and operational Sanity

November 26, 2018

Unit testing alerts with Prometheus

In the previous post we looked at testing rules. You can also test alerts.

Read more

November 19, 2018

Unit testing rules with Prometheus

As of 2.5.0, promtool has a feature to allow you to test your recording rules.

Read more

October 15, 2018

Graph top N time series in Grafana

As of Grafana 5.3.0 there's a feature that allows correct graphing of the top N series over a duration.

Read more

September 24, 2018

Alerting on approaching open file limits

In a previous post we looked at dealing with reaching the open file limit. How about alerting before it happens?

Read more

August 6, 2018

Aggregating across batch job runs with push_time_seconds

For counting how many times a thing has happened you can use a counter and rate(), but that doesn't work across batch jobs.

 

Read more

July 23, 2018

Absent Alerting for Scraped Metrics

In the previous post we looked at dealing with when all the targets for a job had disappeared. What if you wanted to alert on specific metrics from one target disappearing?

Read more

July 16, 2018

Absent Alerting for Jobs

Alerting on numbers being too big or small is easy with Prometheus. But what if the numbers go missing?

Read more

May 28, 2018

Extracting raw samples from Prometheus

Sometimes you want the raw samples inside Prometheus for analysis or debugging. How do you get that?

Read more

April 30, 2018

Identifying expensive alerting rules

Since Prometheus 2.1 there is a feature to view alerting rule evaluation times in the rules UI. In this blogpost we'll see an example of how this can be used to identify an expensive rule expression.

Read more

April 23, 2018

Why can count(x > 5) not return 0?

When using the count aggregation operator you may have noticed that it sometimes returns nothing rather than 0. Why is this?

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo