Having to manually update a list of machines in a configuration file gets annoying after a while. One of the features of Prometheus is service discovery, allowing you to automatically discover and monitor your EC2 instances!

To start with, I spun up two EC2 instances running the node exporter. I ensured that ports 9100 and 9090 were open in the Security Group.

Two instances running the node exporter

Next I setup an IAM user with the AmazonEC2ReadOnlyAccess policy, noting down the access and secret key. This policy gives the user permission to list EC2 instances.

IAM user with AmazonEC2ReadOnlyAccess policy

Finally I installed Prometheus on one of the machines, with the following configuration in /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 1s
  evaluation_interval: 1s

scrape_configs:
  - job_name: 'node'
    ec2_sd_configs:
      - region: eu-west-1
        access_key: PUT_THE_ACCESS_KEY_HERE
        secret_key: PUT_THE_SECRET_KEY_HERE
        port: 9100

Visiting /consoles/index.html on port 9090 of the Prometheus server I see my two nodes have been automatically discovered:

Relabelling

This is what we wanted, but it'd be nice if we could have the instance ID displayed rather than the private IP address. Maybe we also want to only scrape some nodes based on a tag.

Relabelling gives you the power to customise what labels Prometheus applies to your target. For an example, I put the following into /etc/prometheus/prometheus.yml and reloaded Prometheus:

global:
  scrape_interval: 1s
  evaluation_interval: 1s

scrape_configs:
  - job_name: 'node'
    ec2_sd_configs:
      - region: eu-west-1
        access_key: PUT_THE_ACCESS_KEY_HERE
        secret_key: PUT_THE_SECRET_KEY_HERE
        port: 9100
    relabel_configs:
        # Only monitor instances with a Name starting with "SD Demo"
      - source_labels: [__meta_ec2_tag_Name]
        regex: SD Demo.*
        action: keep
        # Use the instance ID as the instance label
      - source_labels: [__meta_ec2_instance_id]
        target_label: instance

Visiting the console again, I see that it has worked perfectly! The same technique can be used to monitor instances in an auto-scaling group.