When you've broken a metric out into labels a common need is to tell what proportion each label represents of the total. The group_left modifier of Prometheus is the key.

We previously looked at group_left in the context of machine roles. Here the use case is different as there's only one metric with one set of labels involved. We'll take the example of machine CPU, and unlike the last time we won't take advantage of the fact that it happens to sum to 1.0.

As a reminder machine cpu usage comes from a metric called node_cpu and has the basic structure:

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="idle"} 2.03442237e+06
node_cpu{cpu="cpu0",mode="system"} 6605.46
node_cpu{cpu="cpu0",mode="user"} 23343.01
node_cpu{cpu="cpu1",mode="idle"} 2.03471439e+06
node_cpu{cpu="cpu1",mode="system"} 6581.92
node_cpu{cpu="cpu1",mode="user"} 23171.06

What we'd like to know is the proportion of the time each mode takes on each machine.

What we need is the total for each mode for each machine:

sum without (cpu)(rate(node_cpu[1m]))

And the total for each machine:

sum without (mode, cpu)(rate(node_cpu[1m]))

But how do we combine these together? A standard division binary operation won't help, as there's an extra mode label in the first expression compared to the second.

This is where group_left comes in. It says that for each set of matching labels, many time series on the left hand side can match with just one time series on the right hand side. It'll then perform the binary operation, and keep all the labels of the left hand side.

This gives us the expression.

sum without (cpu)(rate(node_cpu[1m]))
/ ignoring(mode) group_left
sum without (mode, cpu)(rate(node_cpu[1m]))

Under the covers the matching looks like:

LHS RHS
{mode="idle",instance="machine1",job="node"} {instance="machine1",job="node"}
{mode="user",instance="machine1",job="node"}
{mode="system",instance="machine1",job="node"}
{mode="idle",instance="machine2",job="node"} {instance="machine2",job="node"}
{mode="user",instance="machine2",job="node"}
{mode="system",instance="machine2",job="node"}

Graphing it produces the desired result: