Usually alert thresholds are hardcoded in the alert. In more sophisticated setups, it would be useful for it to be parameterised based on another time series.
When your team is the only one creating alerts for your Prometheus, having the thresholds directly in the alert is the easiest way to do things. If however other groups want the same alerts in your Prometheus, but with different thresholds, this can be a bit much boilerplate. The good news is that you can use PromQL to make this easier.
To do so create a time series with the threshold using recording rules, or you could have it come from an exporter. This threshold time series can then be compared against the time series of interest, using group_left to handle any potential many-to-one matching. Here we presume that the team label is what you wish to match on, but it could be any label such as instance, job or env:
groups:
- name: example
rules:
- record: something_too_high_threshold
expr: 200
labels:
team: foo
- record: something_too_high_threshold
expr: 400
labels:
team: bar
- alert: SomethingTooHigh
expr: |
# Alert based on per-team thresholds.
something
> on (team) group_left
something_too_high_threshold
You could also provide a default, so only those teams wishing to override it need to configure a threshold. Here the default is 42:
- alert: SomethingTooHigh
expr: |
# Alert based on per-team thresholds, with a default.
something
> on (team) group_left()
(
something_too_high_threshold
or on(team)
count by (team)(something) * 0 + 42
)
The count by (team)(something) * 0 + 42 will produce a time series with the value 42 for every unique team label value in something.
This approach can be combined with the technique to use labels to direct email notifications:, where the threshold time series also includes a label that can subsequently be used by the alertmanager:
groups:
- name: example
rules:
- record: something_too_high_threshold
expr: 200
labels:
team: foo
email_to: foo@example.org
- alert: SomethingTooHigh
expr: |
# Alert based on per-team thresholds, copying over email_to.
something
> on (team) group_left(email_to)
something_too_high_threshold
With any of these alerting rules your users only need to worry about adding new threshold time series, making things easier and less error prone!
Want to know how to get the most out of alerting? Contact us.




No comments.