Reliable Insights

A blog on monitoring, scale and operational Sanity

December 28, 2020

Monitoring is a means, not an end

Does it really have to be perfect?

Read more

December 21, 2020

Policy is for configuration, not metric names

Metric names are part of a time series's identity, so shouldn't include information unrelated to identity.

Read more

December 14, 2020

Prefer without and ignoring

Which of by/without and on/ignoring should you use?

Read more

September 21, 2020

Don’t Try to Swim Upstream

Have you ever felt that a piece of software just isn't doing what you need?

Read more

July 20, 2020

Delete All Your Alerts

Trying to improve alerting piecemeal can be difficult.

Read more

June 22, 2020

Remote read and partial failures

What happens when your clustered storage fails?

Read more

May 18, 2020

Atomic Writes and the Textfile Collector

To avoid weirdness, write your files atomically.

Read more

April 20, 2020

Don’t federate instance labels

Federation can be quite useful, but it's not replication.

Read more

March 2, 2020

Setting Thresholds on Alerts

Alert thresholds can be surprisingly tricky to get right.

Read more

February 24, 2020

Regex Selectors are a Smell

Have you ever found yourself having to keep on updating and tweaking certain regexes in PromQL?

Read more