Reliable Insights

A blog on monitoring, scale and operational Sanity

September 16, 2019

Looking beyond retention

How can you view older data, while keeping your monitoring reliable?

Read more

August 5, 2019

Putting queues in front of Prometheus for reliability

On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.

Read more

July 1, 2019

Why can’t I use the nodename of a machine as the instance label?

The machine knows its own name, couldn't Prometheus use it?

Read more

April 1, 2019

Staleness and PromQL

How should a monitoring system deal with metrics no longer being there?

Read more

August 13, 2018

Why predeclare metrics?

The standard way to use metrics in Prometheus is to declare them at file level, before using them. Why?

Read more

March 26, 2018

Why not send graphs with alerts?

You may have noticed that notifications from the Alertmanager are text. Wouldn't it be nice if Prometheus sent graphs along?

Read more

December 11, 2017

Why are Prometheus histograms cumulative?

Have you ever wondered why the buckets in histograms are not just counters of events that fall into each bucket?

Read more

September 11, 2017

It’s easy to convert Pull to Push

If you have to choose one of push or pull in your core, which should it be?

Read more

August 14, 2017

Which kind of push? Events or metrics?

Continuing in our exploration of the ongoing epic saga of push vs. pull where the very future of humanity is at stake, let's look at two general classes of push that are often conflated.

Read more

May 22, 2017

Push needs Service Discovery

It's often claimed that an advantage of push-based monitoring systems is that, compared to pull-based systems like Prometheus, they don't need service discovery. This isn't true, and I'm going to explain why.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo