Reliable Insights

A blog on monitoring, scale and operational Sanity

October 5, 2020

How long can Prometheus retention be?

Data is smaller and disks are bigger than you'd think.

Read more

November 2, 2015

Avoid outages: Beware the Knee

Your service's traffic is steadily growing, latency has increased a bit but it's within reason. One day you launch a new customer and the latency jumps through the roof causing an outage. What happened? You hit the knee.

Read more

September 14, 2015

Do you know your peak-to-mean ratio?

Traffic from users to your servers isn't a steady stream, it waxes and wanes over the day and week. The peak-to-mean ratio is your primary tool to avoid outages or unnecessary costs due to this.

Read more


Blog   |   Training   |   Book   |   Privacy