A blog on monitoring, scale and operational Sanity
October 5, 2020
Data is smaller and disks are bigger than you'd think.
September 7, 2020
Users are sometimes surprised that Prometheus uses RAM, let's look at that.
November 2, 2015
Your service's traffic is steadily growing, latency has increased a bit but it's within reason. One day you launch a new customer and the latency jumps through the roof causing an outage. What happened? You hit the knee.
September 14, 2015
Traffic from users to your servers isn't a steady stream, it waxes and wanes over the day and week. The peak-to-mean ratio is your primary tool to avoid outages or unnecessary costs due to this.
August 12, 2015
Caches are a common feature of distributed systems, often added to improve performance. There are three main types of cache, and knowing about them will help you design robust systems.
Blog | Training | Book | Privacy