Reliable Insights

A blog on monitoring, scale and operational Sanity

January 16, 2017

Federation, what is it good for?

There's various ways Prometheus federation can be used. To ensure your monitoring is scalable and reliable, let's look at how to best use it.

Read more

November 14, 2016

What’s “up” Doc?

One of the advantages of pull-based monitoring such as Prometheus is that you can tell if the target is healthy as part of the scrape. How do you do that though?

Read more

September 12, 2016

Who wants seconds?

The Prometheus instrumentation guidelines say to use seconds, and the timing functions in client libraries follow this. Why?

Read more

August 29, 2016

Undoing the benefits of labels

It can seem like a good idea to use recording rules to make more explicit the content of a time series, particularly for those not used to labels. However this usually leads to confusing names and losing the benefits of labels.

Read more

August 8, 2016

On the naming of things

How you choose to name metrics is important. If everyone choose different schemes it'd lead to confusion, irritation and prevent us from sharing and reusing each others' work. I'd like to share some guidelines to help keep things sane for everyone.

Read more

August 2, 2016

One agent to rule them all

Another not uncommon question we get about Prometheus is as to why we don't have a single per-machine agent that handles all the collection, and instead have one exporter per application. Doesn't that make it harder to manage?

Read more

July 25, 2016

Target labels are for life, not just for Christmas

How should you choose the labels to put on your Prometheus monitoring targets? Let's take a look.

Read more

July 14, 2016

Monitoring without Consensus

When designing a monitoring system and the datastore that goes with it, it can be tempting to go straight for a clustered highly consistent approach. But is that the best approach?

Read more

July 1, 2016

How to Give a Tech Talk

Since starting Robust Perception a year ago I've given 20+ tech talks at various meetups and conferences across the world. I'd like to share some tips around the practicalities of speaking that I've learned along the way.

Read more

May 9, 2016

Rate then sum, never sum then rate

There's a common misunderstanding when dealing with Prometheus counters, and that is how to apply aggregation and other operations when using the rate and other counter-only functions.

Read more

twitter
youtube
linkedin

Blog   |   Training   |   Book   |   Privacy