Reliable Insights

A blog on monitoring, scale and operational Sanity

February 27, 2017

Label Lookups and the Child

The Prometheus client library guidelines recommend having a Child be returned via labels(). Why?

Read more

August 2, 2016

One agent to rule them all

Another not uncommon question we get about Prometheus is as to why we don't have a single per-machine agent that handles all the collection, and instead have one exporter per application. Doesn't that make it harder to manage?

Read more

July 14, 2016

Monitoring without Consensus

When designing a monitoring system and the datastore that goes with it, it can be tempting to go straight for a clustered highly consistent approach. But is that the best approach?

Read more

June 2, 2016

Prometheus Security: Authentication, Authorization and Encryption

It's a frequently asked question as to how to do various security related features with Prometheus. Let's take a deeper look at why we chose the approach we did.

Read more

May 17, 2016

Prometheus and Alertmanager Architecture

A not uncommon question about Prometheus is why the Alertmanager is a separate binary. Let's look at that.

Read more

August 23, 2015

There are 100,000 Seconds in a Day

Just after you've launched is not the best time to find out that you can't handle the load you predicted, or that running costs are much higher than you'd like. By estimating the operational parameters of your system as you design you can gain confidence that the system will work as you expect.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Careers   |   Privacy   |   Demo