Skip to main content

Prometheus Metrics

Prometheus Metrics

Real-time monitoring and metrics collection for the Control Plane

The Control Plane uses Prometheus for monitoring and metrics collection.

Comprehensive observability

Prometheus enables real-time monitoring of Control Plane services, providing insights into performance, resource usage, and operational health through a robust metrics collection system.

Overview

Prometheus is an open-source systems monitoring and alerting toolkit. The Control Plane integrates Prometheus for:

📊 Performance Monitoring

Track response times, throughput, and error rates across all services.

💻 Resource Utilization

Monitor CPU, memory, and network usage for capacity planning.

🎯 SLO/SLI Measurements

Measure service level objectives and indicators for reliability.

🔍 Operational Insights

Gain visibility into system behavior for troubleshooting.

⚠️ Alerting

Trigger notifications when metrics exceed defined thresholds.

📈 Trend Analysis

Track changes in system behavior over time for planning.

Prometheus Logo

Why Prometheus?

🎯 Precise Targeting

Pull-based model allows for fine-grained control over what metrics are collected and when.

🔍 Multi-dimensional Data

Powerful data model with metric names and key/value pairs enables detailed filtering and grouping.

📊 Powerful Query Language

PromQL provides flexible querying capabilities for real-time analysis and visualization.

⚡ Stand-alone Operation

No reliance on distributed storage; self-contained and easy to deploy and operate.

📈 Time Series Focus

Optimized for time series data, making it ideal for monitoring systems metrics over time.

🧩 Ecosystem Integration

Rich ecosystem of exporters, alert managers, and dashboarding tools like Grafana.

Integration in the Control Plane

Standardized metrics collection

The Control Plane implements a consistent metrics strategy across all components, enabling unified monitoring and alerting.

The Control Plane integrates Prometheus throughout its architecture for comprehensive observability:

Service Instrumentation

All Control Plane services expose standardized metrics:

  • HTTP request duration, status codes, and rates
  • Database connection pools and query times
  • Resource utilization and system metrics
  • Business-level metrics for domain-specific operations

Operator Metrics

Kubernetes operators expose specialized metrics:

  • Reconciliation duration and frequencies
  • Resource creation and deletion rates
  • Error counts by resource type
  • Controller queue depths and processing times

Metric Types

Choosing the right metric type

Select the appropriate metric type based on what you're measuring: counters for events that accumulate, gauges for values that rise and fall, histograms for distributions, and summaries for percentile calculations.

Metric Naming Convention

Control Plane Metric Naming

The Control Plane follows these naming conventions for metrics:

  • Component: Component name (e.g., file_manager, api_operator)
  • Area: Functional area (e.g., http, database)
  • Measurement: Specific measurement (e.g., requests_total, latency_seconds)

Full format: component_area_measurement

Examples:

  • file_manager_http_requests_total
  • api_operator_reconcile_duration_seconds
  • gateway_router_cache_hits_total

Middleware Integration

Automated instrumentation

Middleware integration automatically collects metrics for all HTTP requests without modifying business logic code.

The Control Plane integrates Prometheus metrics with Gofiber middleware.

Service Monitors

Prometheus Operator

The Control Plane uses the Prometheus Operator pattern with ServiceMonitor resources to declaratively define monitoring targets.

The Control Plane uses Prometheus Operator's ServiceMonitor resources to configure scraping.

ServiceMonitor Configuration

Endpoint Configuration

Key settings for ServiceMonitor endpoints:

  • port: Named port to scrape (usually "metrics")
  • path: HTTP path for metrics (default "/metrics")
  • interval: Scrape frequency (e.g., "15s")
  • scheme: HTTP or HTTPS for scraping
  • tlsConfig: Optional TLS settings for secure endpoints
  • bearerTokenSecret: For authenticated endpoints

Target Selection

How ServiceMonitors select targets:

  • selector: Label selector to find Services
  • namespaceSelector: Controls which namespaces to search
  • matchNames: Explicit namespace list
  • anyNamespace: Search all namespaces when true
  • sampleLimit: Maximum samples per scrape

Metrics Endpoint

Standard exposure

Each Control Plane service exposes a standard /metrics endpoint that follows Prometheus exposition format for consistent scraping.

Grafana Dashboards

Visualization

The Control Plane includes pre-built Grafana dashboards that visualize key metrics for operational monitoring and troubleshooting.

Best Practices

Metric design principles

Following consistent metric design principles helps ensure that metrics are useful, maintainable, and don't cause performance issues.

📊 Consistent Naming

Follow the component_subsystem_metric naming pattern with clear, descriptive names that indicate what's being measured.

🏷️ Meaningful Labels

Use labels for dimensions that are useful for filtering and grouping, but avoid high-cardinality labels that can cause performance issues.

⚖️ Metric Selection

Instrument what matters; focus on key performance indicators rather than collecting everything possible.

📋 Documentation

Document all metrics with clear descriptions, units, and example queries to aid in dashboard creation and alerting.

⚠️ Alert Design

Design alerts that trigger on symptoms (user impact) rather than causes, with clear runbooks for remediation.

🔄 Rate Calculations

Use rates and averages over appropriate time windows to avoid noise from normal fluctuations in metrics.

Zap Logging

Learn about the structured logging used alongside Prometheus metrics.

Architecture

Understand how monitoring fits into the overall Control Plane architecture.

Kubernetes

Discover how the Control Plane builds on Kubernetes observability patterns.