Reliable Insights

A blog on monitoring, scale and operational Sanity

August 10, 2020

Using the group() aggregator in PromQL

Prometheus 2.20 added a group aggregator. What is it for?

Read more

August 3, 2020

Linux software RAID metrics from the node exporter

/proc/mdstat is another of the files that the node exporter exposes as metrics.

Read more

July 27, 2020

New Features in Prometheus 2.20.0

Prometheus 2.20.0 is now out, following on from 2.19.0 with many fixes and improvements.

Read more

July 20, 2020

Delete All Your Alerts

Trying to improve alerting piecemeal can be difficult.

Read more

July 13, 2020

Time metric from the node exporter

The node exporter exposes the current machine time.

Read more

July 6, 2020

Creating Alertmanager Silences from Python

We recently looked at creating silences from the command line, what about from programs?

Read more

June 22, 2020

Remote read and partial failures

What happens when your clustered storage fails?

Read more

June 15, 2020

New Features in Prometheus 2.19.0

Prometheus 2.19.0 is now out, following on from 2.18.0 with many fixes and improvements.

Read more

June 8, 2020

Pre-creating Alertmanager Silences

You don't have to wait for alerts to fire to create a silence.

Read more

June 1, 2020

Debugging out of order samples

How do you debug and resolve the "Error on ingesting out-of-order samples" warning from Prometheus?

Read more


Blog   |   Training   |   Book   |   Privacy