Create Prometheus metrics from a dynamic source in Python

While the process for adding Prometheus metrics to a Python application is well documented in the prometheus_client documentation, dealing with adding metrics when you only know what the metric name or labels are going to be at runtime is trickier. Normal metric classes expect to be declared at module level so the default collector can pick them up. The documentation hints at a solution however. Use a Custom Collector.

The maintainer of the python client library has already done an excellent write-up on how to use custom collectors to take data from existing systems and create an exporter with them. The article (on extracting Jenkins job information) is here: https://www.robustperception.io/writing-a-jenkins-exporter-in-python

This article will describe how I took a Django application I wrote to store information on service level agreements, and expose component service window information as metrics to the application’s own metrics endpoint (Implemented with the excellent django-prometheus package).

Implementation

To add a Custom collector to a Django application, you will need to do three things:

  1. Have a model or models that supply data you want to turn into metrics.
  2. Write the collector class.
  3. Register the class with the prometheus client global registry ONCE ONLY, and make sure this happens AFTER the database has initialised, and only when the django app is actually running. This last part is probably the part that caused me the most grief.

Assuming you’ve already carried out step one, this is how you go about steps 2 and 3:

Step 2: Write the collector

A collector class is a class that implements the ‘collect’ method. The ‘collect’ method is a generator, that yields <type>MetricFamily objects, where <type> can be a Counter, GaugeHistogram, Gauge, Histogram, Info, StateSet, Summary, Unknown, or Untyped metric type.

Example (monitoring.py)

from prometheus_client.core import GaugeMetricFamily
from django.utils import timezone
from .models import Component


SERVICE_WINDOW_LAST_START_METRIC = 'service_window_last_start'
SERVICE_WINDOW_LAST_START_DOC = 'Last start time of the service window'
SERVICE_WINDOW_LAST_END_METRIC = 'service_window_last_end'
SERVICE_WINDOW_LAST_END_DOC = 'Last end time of the service window'
SERVICE_WINDOW_NEXT_START_METRIC = 'service_window_next_start'
SERVICE_WINDOW_NEXT_START_DOC = 'Next start time of the service window'
SERVICE_WINDOW_NEXT_END_METRIC = 'service_window_next_end'
SERVICE_WINDOW_NEXT_END_DOC = 'Next end time of the service window'
SERVICE_WINDOW_IN_WINDOW_METRIC = 'service_window_in_window'
SERVICE_WINDOW_IN_WINDOW_DOC = 'Is the service window active (1 for yes, 0 for no)'


class ComponentCollector(object):
    def collect(self):
        moment = timezone.now()
        components = Component.objects.all()
        metrics = {}

        for component in components:
            labels = component.get_labels()
            prefix = component.name.replace('-', '_') + "_"
            metrics[component.name] = {
                'last_start': GaugeMetricFamily(''.join( (prefix, SERVICE_WINDOW_LAST_START_METRIC)),
                                                SERVICE_WINDOW_LAST_START_DOC, labels=labels.keys()),

                'last_end': GaugeMetricFamily(''.join( (prefix, SERVICE_WINDOW_LAST_END_METRIC)),
                                              SERVICE_WINDOW_LAST_END_DOC, labels=labels.keys()),

                'next_start': GaugeMetricFamily(''.join( (prefix, SERVICE_WINDOW_NEXT_START_METRIC)),
                                                SERVICE_WINDOW_NEXT_START_DOC, labels=labels.keys()),

                'next_end': GaugeMetricFamily(''.join( (prefix, SERVICE_WINDOW_NEXT_END_METRIC)),
                                              SERVICE_WINDOW_NEXT_END_DOC, labels=labels.keys()),

                'in_window': GaugeMetricFamily(''.join( (prefix, SERVICE_WINDOW_IN_WINDOW_METRIC)),
                                               SERVICE_WINDOW_IN_WINDOW_DOC, labels=labels.keys()),
            }

            metrics[component.name]['last_start'].add_metric(labels=labels.values(),
                                                             value=component.get_last_start_time(moment).timestamp())
            metrics[component.name]['last_end'].add_metric(labels=labels.values(),
                                                           value=component.get_next_end_time(moment).timestamp())
            metrics[component.name]['next_start'].add_metric(labels=labels.values(),
                                                             value=component.get_next_start_time(moment).timestamp())
            metrics[component.name]['next_end'].add_metric(labels=labels.values(),
                                                           value=component.get_next_end_time(moment).timestamp())
            metrics[component.name]['in_window'].add_metric(labels=labels.values(),
                                                            value=int(component.in_window(moment)))
        for comp in metrics.keys():
            for metric in metrics[comp].values():
                yield metric

In this example, I’ve taken a Component model, that exposes the service window last and next start & end times, plus indicates if the current time is in a service window for the component. The metrics:

  • <component_name>_service_window_last_start
  • <component_name>_service_window_last_end
  • <component_name>_service_window_next_start
  • <component_name>_service_window_next_end
  • <component_name>_service_window_in_window

are created, and the labels added to the component are added as metric labels to the metrics.

The <type>MetricFamily class does the rest of the work. The default prometheus registry class will run the collect once to store the metric definitions, then run collect to obtain updated metric values on each scrape.

Step 3: Registering the collector

This involves some Django trickery in the app.py module of your project.

You will need to do the following:

  1. Write a migration hook to register if you are running a migration instead of the actual application.
  2. Write another hook to register when you’ve connected to the database.
  3. Register both hooks in the AppConfig ready method.
  4. Register your Collector class with the prometheus registry the first time the database connection hook fires ONLY.

Example (apps.py)

from django.apps import AppConfig

from django.db.models.signals import post_migrate
from django.db.backends.signals import connection_created
from prometheus_client import REGISTRY


migration_executed = False
monitoring_initialised = False


def post_migration_callback(sender, **kwargs):
    global migration_executed
    logger.info('Migration executed')
    migration_executed = True


def connection_callback(sender, connection, **kwargs):
    global monitoring_initialised
    # Check to see if we are not running a unittest temp db
    if not connection.settings_dict['NAME'] == 'file:memorydb_default?mode=memory&cache=shared':
        if not monitoring_initialised:
            from .monitoring import ComponentCollector
            REGISTRY.register(ComponentCollector())
            monitoring_initialised = True


class ComponentSlaMonitorConfig(AppConfig):
    name = 'component_sla_monitor'

    def ready(self):
        global migration_executed
        post_migrate.connect(post_migration_callback, sender=self)

        if not migration_executed:
            connection_created.connect(connection_callback)

Note that we only import the Collector in the connection_callback hook. This is because importing at the top of the file will cause django database errors.

Also, note the check to see if the DB connection is with an in-memory database. This is to disable monitoring registration during unit tests.

This code is based on Django 2.2. The ready method, and some of the hooks have only been available since Django 1.7