Reliable Insights

A blog on monitoring, scale and operational Sanity

April 16, 2018

When to Alert with Prometheus

Alerting is an art. One must be sure to alert just enough to be aware of all problems arising in the monitored system while at the same time not drown out the signal with excess noise. In this blogpost we'll explain some of the best practices to use when alerting with Prometheus.

Read more

March 19, 2018

Alerting on crash loops with Prometheus

If your applications are restarting regularly, whether due to segfaults or OOMs, it'd be nice to know.

Read more

February 12, 2018

Alerting on gauges in Prometheus 2.0

One of the major changes introduced in Prometheus 2.0 was that of staleness handling. Previously for instant vectors, Prometheus would return a point up to 5 minutes in the past which caused a number of different issues.

Read more

December 18, 2017

What’s the difference between group_interval, group_wait, and repeat_interval?

In this blogpost we try and clear up some confusion by outlining the key differences between commonly confused alerting configuration options: group_interval, group_wait, and repeat_interval.

Read more

December 4, 2017

Using time series as alert thresholds

Usually alert thresholds are hardcoded in the alert. In more sophisticated setups, it would be useful for it to be parameterised based on another time series.

Read more

October 30, 2017

Running into burning buildings because the fire alarm stopped

At what point should you consider an alert resolved?

Read more

August 28, 2017

Avoid irate() in alerts

While the irate() function is useful for granular graphs, it is not suitable for alerting.

Read more

March 27, 2017

Combining alert conditions

Prometheus alerts use the same powerful PromQL expressions as queries and graphs. This can be used to produce sophisticated alerts.

Read more

March 6, 2017

Booleans, logic and math

Prometheus doesn't have an explicit boolean type or functionality. However there is a convention and enough power in PromQL to work with booleans.

Read more

February 13, 2017

Using labels to direct email notifications

A handy feature of the Alertmanager is that almost all notification fields are templatable. This can be used to route emails based on labels.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo