Reliable Insights

A blog on monitoring, scale and operational Sanity

August 19, 2019

New Features in Prometheus 2.12.0

Prometheus 2.12.0 is now out, following on from 2.11.0 with many fixes and improvements.

Read more

August 12, 2019

What range should I use with rate()?

Choosing what range to use with the rate function can be a bit subtle.

Read more

August 8, 2019

Analysing data from Grafana graphs in R

PromQL is superb for metrics alerting and graphing needs, for heavier statistical work there are better options.

Read more

August 5, 2019

Putting queues in front of Prometheus for reliability

On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.

Read more

July 29, 2019

Step and query_range

Graphs from Prometheus use the query_range endpoint, and there's a non-trivial amount of confusion that it's more magic than it actually is.

Read more

July 22, 2019

How should pipelines be monitored?

For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?

Read more

July 15, 2019

New Features in Prometheus 2.11.0

Prometheus 2.11.0 is now out, following on from 2.10.0 with many fixes and improvements.

Read more

July 8, 2019

Switching between Prometheus servers in Grafana using data source variables

Having to maintain dashboards for every Prometheus server you have would be a bit annoying. Thankfully Grafana has a feature for this.

Read more

July 1, 2019

Why can’t I use the nodename of a machine as the instance label?

The machine knows its own name, couldn't Prometheus use it?

Read more

June 24, 2019

How much disk space do Prometheus blocks use?

Memory for ingestion is just one part of the resources Prometheus uses, let's look at disk blocks.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy