In the previous post we looked at dealing with when all the targets for a job had disappeared. What if you wanted to alert on specific metrics from one target disappearing?

It's best to avoid metrics that appear and disappear, however it can happen that certain subsystems of a target don't always return all metrics that they should. It is possible to detect this situation by noticing that the up metric exists, but the metric in question does not. In addition you will want to check that up is 1, so that the alert doesn't spuriously fire when the target is down. If you already have down alerts for the job, there's no need to spam yourself with additional ones about missing metrics too.

The alert would look something like:

groups:
- name: example
  rules:
  - alert: MyJobMissingMyMetric
    expr: up{job="myjob"} == 1 unless my_metric
    for: 10m

This uses unless which returns the left hand side, unless there's a matching metric on the right hand side.

 

As with other binary operators you can use ignoring if your metric has instrumentation labels that you wish to ignore, so for example unless ignoring(method)  would be appropriate if my_metric had a method label.

 

 

Want advice on how to avoid missing metrics in the first place? Contact us.