Your service’s traffic is steadily growing, latency has increased a bit but it’s within reason. One day you launch a new customer and the latency jumps through the roof causing an outage. What happened? You hit the knee.
I enjoy cooking and regularly make scrumptious meals for myself. Does this mean that I’m capable of running a busy kitchen? Of course not! So why assume that all software engineers can automatically run production services?read more
It’s common to think of monitoring as something just to alert you when things are going wrong. At Robust Perception we believe in Inclusive Monitoring, where all aspects of systems are monitored and available to provide insight and drive decisions.read more
Systems such as Consul perform healthchecking of local services and expose this information to other machines within the cluster. Does this mean that the service will work when you try to talk to it?read more
Traffic from users to your servers isn’t a steady stream, it waxes and wanes over the day and week. The peak-to-mean ratio is your primary tool to avoid outages or unnecessary costs due to this.read more
Just after you’ve launched is not the best time to find out that you can’t handle the load you predicted, or that running costs are much higher than you’d like. By estimating the operational parameters of your system as you design you can gain confidence that the system will work as you expect.read more
Caches are a common feature of distributed systems, often added to improve performance. There are three main types of cache, and knowing about them will help you design robust systems.read more