Reliable Insights

A blog on monitoring, scale and operational Sanity

February 18, 2019

How much of the time is my network usage over a certain amount?

The new subquery feature in Prometheus 2.7 makes this possible in one query.

Read more

February 11, 2019

“INVALID” is not a valid start token and other scrape errors

What can you do to pin down metrics-parsing related scrape errors in Prometheus?

Read more

February 4, 2019

Using tsdb analyze to investigate churn and cardinality

The Prometheus TSDB's code base includes a tool to help you find "interesting" metrics in terms of storage performance.

Read more

January 28, 2019

New Features in Prometheus 2.7.0

Prometheus 2.7.0 is now out, following on from 2.6.0 last month with many fixes and improvements.

Read more

January 21, 2019

Optimising Prometheus 2.6.0 Memory Usage with pprof

The 2.6.0 release of Prometheus includes optimisations to reduce the memory taken by indexes and compaction.

Read more

January 14, 2019

Why do resolved notifications contain old values?

It often confuses users as to why resolved notifications don't contain updated annotations values. Let's dig into why.

Read more

January 7, 2019

Optimising startup time of Prometheus 2.6.0 with pprof

The 2.6.0 release of Prometheus includes WAL loading optimisations to make startup faster.

Read more

December 31, 2018

Don’t put the value in alert labels

The labels of an alert are its identity, so you have to be a little careful what you put in there.

Read more

December 24, 2018

New Features in Prometheus 2.6.0

Prometheus 2.6.0 is now out, following on from 2.5.0 last month with many fixes and improvements.

Read more

December 17, 2018

Limiting PromQL resource usage

Prometheus has gained a number of features to limit the impact of expensive PromQL queries.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo