scaling – Robust Perception | Prometheus Monitoring Experts

Another not uncommon question we get about Prometheus is as to why we don't have a single per-machine agent that handles all the collection, and instead have one exporter per application. Doesn't that make it harder to manage?

Published by Brian Brazil in Posts

Tags: best practices, design, prometheus, scaling

November 18, 2015

Do you have basic infrastructure?

When starting out it's easy to think that you need Docker, Kubernetes, Microservices, Continuous Deployment and all the other trending topics on Hacker News/Reddit/Lobsters. What do you really need?

Published by Brian Brazil in Posts

Tags: best practices, change management, configuration management, reliability, scaling

November 4, 2015

Unlimited costs, Limited revenue

This week Microsoft removed unlimited storage from their OneDrive offering, because surprise surprise people were using it as unlimited storage. Does your product have features that cost you time and money, without your users paying accordingly?

Published by Brian Brazil in Posts

Tags: abuse, best practices, business, estimation, scaling

November 2, 2015

Avoid outages: Beware the Knee

Your service's traffic is steadily growing, latency has increased a bit but it's within reason. One day you launch a new customer and the latency jumps through the roof causing an outage. What happened? You hit the knee.

Published by Brian Brazil in Posts

Tags: best practices, capacity, provisioning, reliability, scaling

October 29, 2015

Cooking a meal isn’t the same as running a restaurant

I enjoy cooking and regularly make scrumptious meals for myself. Does this mean that I'm capable of running a busy kitchen? Of course not! So why assume that all software engineers can automatically run production services?

Published by Brian Brazil in Posts

Tags: availability, best practices, reliability, scaling

October 8, 2015

Monitoring: Not Just For Outages

It's common to think of monitoring as something just to alert you when things are going wrong. At Robust Perception we believe in Inclusive Monitoring, where all aspects of systems are monitored and available to provide insight and drive decisions.

Published by Brian Brazil in Posts

Tags: best practices, estimation, inclusive monitoring, scaling

August 23, 2015

There are 100,000 Seconds in a Day

Just after you've launched is not the best time to find out that you can't handle the load you predicted, or that running costs are much higher than you'd like. By estimating the operational parameters of your system as you design you can gain confidence that the system will work as you expect.

Published by Brian Brazil in Posts

Tags: best practices, design, estimation, scaling

Reliable Insights