Reliable Insights

A blog on monitoring, scale and operational Sanity

November 2, 2020

Checking for specific HTTP status codes with the blackbox exporter

How would you check that a HTTP endpoint is returning a 204?

Published by Brian Brazil in Posts

Tags: alerting, blackbox_exporter, prometheus

Delete All Your Alerts

Trying to improve alerting piecemeal can be difficult.

Published by Brian Brazil in Posts

Tags: alerting, best practices, prometheus

Setting Thresholds on Alerts

Alert thresholds can be surprisingly tricky to get right.

Published by Brian Brazil in Posts

Tags: alerting, best practices, prometheus

How should pipelines be monitored?

For online serving systems it's fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though?

Published by Brian Brazil in Posts

Tags: alerting, best practices, prometheus, promql

Idempotent Cron Jobs are Operable Cron Jobs

Having to reconstruct how far a failed cron job had gotten and what exact parameters it was run with can be error prone and time consuming. There is a better way.

Published by Brian Brazil in Posts

Tags: alerting, best practices, prometheus, reliability

January 14, 2019

Why do resolved notifications contain old values?

It often confuses users as to why resolved notifications don't contain updated annotations values. Let's dig into why.

Published by Brian Brazil in Posts

Tags: alerting, alertmanager, prometheus

December 31, 2018

Don’t put the value in alert labels

The labels of an alert are its identity, so you have to be a little careful what you put in there.

Published by Brian Brazil in Posts

Tags: alerting, alertmanager, prometheus

November 26, 2018

Unit testing alerts with Prometheus

In the previous post we looked at testing rules. You can also test alerts.

Published by Brian Brazil in Posts

Tags: alerting, promql, promtool, testing

October 8, 2018

Checking for HTTP 200s with the Blackbox Exporter

It's easy to check if HTTP and HTTPS endpoints are working with the Blackbox Exporter.

Published by Brian Brazil in Posts

Tags: alerting, blackbox_exporter, prometheus

September 24, 2018

Alerting on approaching open file limits

In a previous post we looked at dealing with reaching the open file limit. How about alerting before it happens?

Published by Brian Brazil in Posts

Tags: alerting, prometheus, promql

Older Posts ›

youtube

Blog | Training | Book | Privacy