A blog on monitoring, scale and operational Sanity
April 23, 2018
When using the count aggregation operator you may have noticed that it sometimes returns nothing rather than 0. Why is this?
March 19, 2018
If your applications are restarting regularly, whether due to segfaults or OOMs, it'd be nice to know.
February 12, 2018
One of the major changes introduced in Prometheus 2.0 was that of staleness handling. Previously for instant vectors, Prometheus would return a point up to 5 minutes in the past which caused a number of different issues.
February 5, 2018
Have you ever wondered what percentage of time a given service or application spends up or down?
January 1, 2018
Prometheus 2.0 brought with it rule groups, making hierarchical aggregation easier than ever.
December 11, 2017
Have you ever wondered why the buckets in histograms are not just counters of events that fall into each bucket?
December 4, 2017
Usually alert thresholds are hardcoded in the alert. In more sophisticated setups, it would be useful for it to be parameterised based on another time series.
October 23, 2017
With the upcoming release of Prometheus 2.0 comes a new format for writing recording and alerting rules.
September 4, 2017
As PromQL has evolved, there are some functions that should no longer be used.
August 28, 2017
While the irate() function is useful for granular graphs, it is not suitable for alerting.
Blog | Training | Book | Careers | Privacy | Demo