Reliable Insights

A blog on monitoring, scale and operational Sanity

September 23, 2019

Laying out Alertmanager routes

How should you design your Alertmanager routes for flexibility and growth?

Read more

September 16, 2019

Looking beyond retention

How can you view older data, while keeping your monitoring reliable?

Read more

September 9, 2019

What queries were running when Prometheus died?

As of Prometheus 2.12.0 there's a new feature to help find problematic queries.

Read more

September 2, 2019

Cardinality is key

Prometheus performance almost always comes down to one thing: label cardinality.

Read more

August 26, 2019

Adding TLS to Prometheus with Caddy

Prometheus doesn't come with server-side security features out of the box, but there's many options out there to provide them.

Read more

August 19, 2019

New Features in Prometheus 2.12.0

Prometheus 2.12.0 is now out, following on from 2.11.0 with many fixes and improvements.

Read more

August 12, 2019

What range should I use with rate()?

Choosing what range to use with the rate function can be a bit subtle.

Read more

August 8, 2019

Analysing data from Grafana graphs in R

PromQL is superb for metrics alerting and graphing needs, for heavier statistical work there are better options.

Read more

August 5, 2019

Putting queues in front of Prometheus for reliability

On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.

Read more

July 29, 2019

Step and query_range

Graphs from Prometheus use the query_range endpoint, and there's a non-trivial amount of confusion that it's more magic than it actually is.

Read more


Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo