Since Prometheus 2.1 there is a feature to view alerting rule evaluation times in the rules UI. In this blogpost we’ll see an example of how this can be used to identify an expensive rule expression.
Alerting is an art. One must be sure to alert just enough to be aware of all problems arising in the monitored system while at the same time not drown out the signal with excess noise. In this blogpost we’ll explain some of the best practices to use when alerting with Prometheus.
So you have just discovered Prometheus and want to try it out or use it to replace your old monitoring system but have run into a part of your stack that you cannot instrument with a client library and for which there are no officially supported exporters. What do you do?
Jobs of an ephemeral nature are often not around long enough to have their metrics scraped by Prometheus. In order to remedy this the Pushgateway was developed to allow for these types of jobs to push their metrics to a metrics cache in order to be scraped by Prometheus long after the original jobs have
One of the major changes introduced in Prometheus 2.0 was that of staleness handling. Previously for instant vectors, Prometheus would return a point up to 5 minutes in the past which caused a number of different issues.
Have you ever wondered what percentage of time a given service or application spends up or down?
Having previously discussed why the Prometheus project does not support SSL and user authentication out of the box and detailing how to add basic authentication with Nginx, we will now demonstrate how to do the same with Apache.
In this blogpost we’ll run you through a quick ‘hello world’ example instrumenting a Rails application with the Prometheus ruby client.
The world of infosec is alarmed right now over the recent security vulnerabilities disclosed by Google on Wednesday that affect Intel, AMD, and ARM chips. The now infamous Meltdown and Spectre bugs allow for the reading of sensitive information from a system’s memory, including passwords, private keys and other sensitive information. Thankfully fixes are being swiftly
In this blogpost we try and clear up some confusion by outlining the key differences between commonly confused alerting configuration options: group_interval, group_wait, and repeat_interval.