Reliable Insights

A blog on monitoring, scale and operational Sanity

September 7, 2020

Why does Prometheus use so much RAM?

Users are sometimes surprised that Prometheus uses RAM, let's look at that.

Read more

August 5, 2019

Putting queues in front of Prometheus for reliability

On a regular basis a potential Prometheus user says they need a different architecture to make things reliable or scalable. Let's look at that.

Read more

January 16, 2017

Federation, what is it good for?

There's various ways Prometheus federation can be used. To ensure your monitoring is scalable and reliable, let's look at how to best use it.

Read more

August 2, 2016

One agent to rule them all

Another not uncommon question we get about Prometheus is as to why we don't have a single per-machine agent that handles all the collection, and instead have one exporter per application. Doesn't that make it harder to manage?

Read more

November 18, 2015

Do you have basic infrastructure?

When starting out it's easy to think that you need Docker, Kubernetes, Microservices, Continuous Deployment and all the other trending topics on Hacker News/Reddit/Lobsters. What do you really need?

Read more

November 4, 2015

Unlimited costs, Limited revenue

This week Microsoft removed unlimited storage from their OneDrive offering, because surprise surprise people were using it as unlimited storage. Does your product have features that cost you time and money, without your users paying accordingly?

Read more

November 2, 2015

Avoid outages: Beware the Knee

Your service's traffic is steadily growing, latency has increased a bit but it's within reason. One day you launch a new customer and the latency jumps through the roof causing an outage. What happened? You hit the knee.

Read more

October 29, 2015

Cooking a meal isn’t the same as running a restaurant

I enjoy cooking and regularly make scrumptious meals for myself. Does this mean that I'm capable of running a busy kitchen? Of course not! So why assume that all software engineers can automatically run production services?

Read more

October 8, 2015

Monitoring: Not Just For Outages

It's common to think of monitoring as something just to alert you when things are going wrong.  At Robust Perception we believe in Inclusive Monitoring, where all aspects of systems are monitored and available to provide insight and drive decisions.

Read more

August 23, 2015

There are 100,000 Seconds in a Day

Just after you've launched is not the best time to find out that you can't handle the load you predicted, or that running costs are much higher than you'd like. By estimating the operational parameters of your system as you design you can gain confidence that the system will work as you expect.

Read more


Blog   |   Training   |   Book   |   Privacy