Reliable Insights

A blog on monitoring, scale and operational sanity

What percentage of time is my service down for?

Have you ever wondered what percentage of time a given service or application spends up or down?

In this blogpost we’ll demonstrate how to use the Blackbox exporter with Prometheus in order to achieve this.

Setting up a simple contrived example, we’ll run both the Blackbox and Node exporter, and configure Prometheus to tell the Blackbox exporter to issue a simple HTTP probe to the node exporter and scrape the result.

  scrape_interval:    5s
  evaluation_interval: 5s

  - job_name: 'node'
    metrics_path: /probe
      module: [http_2xx]  # Look for a HTTP 200 response.
      - targets:
        - http://localhost:9100
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement:  # The blackbox exporter's real hostname:port.

Using the query function avg_over_time() we can get the average value of the blackbox exporter’s probe_success metric over a given time period which simply reports 1 or 0 depending on whether the target probed responds with a HTTP 200 response for our given probe.

The examples below show the result of this query function when looking at probe_success over a period of 15 minutes. We multiply by 100 to get a percentage.

In order to get a percentage of 80%, I killed the Node exporter for a few minutes.

(The full query used is avg_over_time(probe_success{job="node"}[15m]) * 100



Interested in gaining more operational insights with Prometheus? Contact us.

Conor BroderickWhat percentage of time is my service down for?
Share this post

Related Posts