Prometheus Alertmanager cluster in Docker Swarm

Prometheus monitoring and Docker combine together really well, but configuring an Alertmanager cluster can be a bit of a challenge if you don’t find the trick. This article shows a method that both works, and isn’t overly complicated to set up.

The trick is, that while it isn’t possible to pass the cluster.peer parameters correctly to a single service entry, you can use 2 or more numbered service entries instead, and define a network alias to combine them into a DNS-searchable whole for your further configuration.

Docker compose configuration

This is a sample docker compose that can be instantiated in Docker swarm using docker stack deploy –compose-file …

# docker-compose.yml
version: '3.7'
services:
  alertmanager_1:
    image: prom/alertmanager:latest
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
      - '--cluster.peer=tasks.alertmanager_2:9094'
    deploy:
      mode: global
      placement:
        constraints:
          - node.hostname == swarm-manager000000
    networks:
      prometheus_stack:
        aliases:
          - alertmanager
    ports:
        - '19093:9093'
    volumes:
      - alertmanager-data:/alertmanager
      - alertmanager-config:/etc/alertmanager
  alertmanager_2:
    image: prom/alertmanager:latest
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
      - '--cluster.peer=tasks.alertmanager_1:9094'
    deploy:
      mode: global
      placement:
        constraints:
          - node.hostname == swarm-manager000001
    networks:
      prometheus_stack:
        aliases:
          - alertmanager
    ports:
        - '29093:9093'
    volumes:
      - alertmanager-data:/alertmanager
      - alertmanager-config:/etc/alertmanager
networks:
  prometheus_stack:
    driver: overlay
    attachable: true 
volumes:
  alertmanager-data: {}  
  alertmanager-config: {} 

Note the following aspects:

  • Each alertmanager gets a named service, locked to a single node via placement constraints (fill in your own node names here).
  • The cluster.peer setting refers to the service name of the other alertmanager service(s)
  • This configuration is ready to use my method for updating configuration using git push as described in my article on Dynamic Docker configuration management
  • Because we’ve defined a network alias on each alertmanager service, we can use DNS service discovery in our Prometheus config file to find the alertmanagers, using docker swarm’s task.<servicename> DNS entries.

prometheus.yml

...
# Alertmanager configuration
alerting:
  alertmanagers:
  - dns_sd_configs:
    - names:
      - 'tasks.alertmanager'
      type: 'A'
      port: 9093

Things to be aware of…

  • When running on a cluster, if you are using a volume for storing the alertmanager configuration, you should be using a shared storage volume driver. My own swarm is running on Docker for Azure, and uses the cloudstor:azure driver.
    • If you can’t do this, you’ll have to attach your config files using configs: blocks. For static configurations this is fine, but in an active environment, versioning your config names becomes a nuisance very fast.