Reliable Insights

A blog on monitoring, scale and operational Sanity

October 30, 2017

Running into burning buildings because the fire alarm stopped

At what point should you consider an alert resolved?

Read more

September 25, 2017

Percentages go from 0 to 100, Ratios go from 0 to 1

While doing research for implementing exporters, I've noticed some confusion around ratios and percentages that I'd like to clear up.

Read more

September 11, 2017

It’s easy to convert Pull to Push

If you have to choose one of push or pull in your core, which should it be?

Read more

September 4, 2017

Functions to Avoid

As PromQL has evolved, there are some functions that should no longer be used.

Read more

August 28, 2017

Avoid irate() in alerts

While the irate() function is useful for granular graphs, it is not suitable for alerting.

Read more

July 24, 2017

Existential issues with metrics

The Prometheus instrumentation best practices say to "Avoid missing metrics". Let's look at why, and how to deal with it.

Read more

July 3, 2017

When should you unit test instrumentation?

Should you unit test every bit of instrumentation you add? Not always.

Read more

May 29, 2017

What’s in a __name__?

You may have noticed that most PromQL functions and operators remove the metric name in their result. Let's look at why.

Read more

May 22, 2017

Push needs Service Discovery

It's often claimed that an advantage of push-based monitoring systems is that, compared to pull-based systems like Prometheus, they don't need service discovery. This isn't true, and I'm going to explain why.

Read more

February 27, 2017

Label Lookups and the Child

The Prometheus client library guidelines recommend having a Child be returned via labels(). Why?

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy