Reliable Insights

A blog on monitoring, scale and operational Sanity

November 18, 2019

Evaluating Performance and Correctness

There's more to selecting a piece of software than headline numbers. Read more

September 30, 2019

How does a Prometheus Histogram work?

We looked previously at the counter, gauge, and summary, how does the Prometheus histogram work?

Read more

September 9, 2019

What queries were running when Prometheus died?

As of Prometheus 2.12.0 there's a new feature to help find problematic queries.

Read more

August 12, 2019

What range should I use with rate()?

Choosing what range to use with the rate function can be a bit subtle.

Read more

July 29, 2019

Step and query_range

Graphs from Prometheus use the query_range endpoint, and there's a non-trivial amount of confusion that it's more magic than it actually is.

Read more

July 22, 2019

How should pipelines be monitored?

For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?

Read more

May 20, 2019

Analyse a metric by kernel version

Using PromQL you can combine metrics for analysis.

Read more

April 1, 2019

Staleness and PromQL

How should a monitoring system deal with metrics no longer being there?

Read more

March 25, 2019

How does a Prometheus Summary work?

We looked previously at the counter and gauge, how does the Prometheus summary work?

Read more

March 11, 2019

Mapping iostat to the node exporter’s node_disk_* metrics

The node exporter and tools like iostat and sar use the same core data, but how do they relate to each other?

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy