Reliable Insights

A blog on monitoring, scale and operational Sanity

March 23, 2020

Why info-style metrics have a value of 1

You've seen metrics like prometheus_build_info, but why do they have a value of 1?

Read more

January 13, 2020

Optimising index memory usage for blocks

One of the big changes in Prometheus 2.15.0 was reduced memory usage for indexes.

Read more

December 9, 2019

PromQL Subqueries and Alignment

Subqueries were added to PromQL a while back, but there's more to this feature than it'd seem.

Read more

September 16, 2019

Looking beyond retention

How can you view older data, while keeping your monitoring reliable?

Read more

August 5, 2019

Putting queues in front of Prometheus for reliability

On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.

Read more

July 1, 2019

Why can’t I use the nodename of a machine as the instance label?

The machine knows its own name, couldn't Prometheus use it?

Read more

April 1, 2019

Staleness and PromQL

How should a monitoring system deal with metrics no longer being there?

Read more

August 13, 2018

Why predeclare metrics?

The standard way to use metrics in Prometheus is to declare them at file level, before using them. Why?

Read more

March 26, 2018

Why not send graphs with alerts?

You may have noticed that notifications from the Alertmanager are text. Wouldn't it be nice if Prometheus sent graphs along?

Read more

December 11, 2017

Why are Prometheus histograms cumulative?

Have you ever wondered why the buckets in histograms are not just counters of events that fall into each bucket?

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy