Prometheus Metrics

Real-time monitoring and metrics collection for the Control Plane

The Control Plane uses Prometheus for monitoring and metrics collection.

Comprehensive observability

Prometheus enables real-time monitoring of Control Plane services, providing insights into performance, resource usage, and operational health through a robust metrics collection system.

Overview

Prometheus is an open-source systems monitoring and alerting toolkit. The Control Plane integrates Prometheus for:

Track response times, throughput, and error rates across all services.

Monitor CPU, memory, and network usage for capacity planning.

Measure service level objectives and indicators for reliability.

Gain visibility into system behavior for troubleshooting.

Trigger notifications when metrics exceed defined thresholds.

Track changes in system behavior over time for planning.

Why Prometheus?

Pull-based model allows for fine-grained control over what metrics are collected and when.

Powerful data model with metric names and key/value pairs enables detailed filtering and grouping.

PromQL provides flexible querying capabilities for real-time analysis and visualization.

No reliance on distributed storage; self-contained and easy to deploy and operate.

Optimized for time series data, making it ideal for monitoring systems metrics over time.

Rich ecosystem of exporters, alert managers, and dashboarding tools like Grafana.

Integration in the Control Plane

Standardized metrics collection

The Control Plane implements a consistent metrics strategy across all components, enabling unified monitoring and alerting.

The Control Plane integrates Prometheus throughout its architecture for comprehensive observability:

All Control Plane services expose standardized metrics:

HTTP request duration, status codes, and rates
Database connection pools and query times
Resource utilization and system metrics
Business-level metrics for domain-specific operations

Kubernetes operators expose specialized metrics:

Reconciliation duration and frequencies
Resource creation and deletion rates
Error counts by resource type
Controller queue depths and processing times

Metric Types

Choosing the right metric type

Select the appropriate metric type based on what you're measuring: counters for events that accumulate, gauges for values that rise and fall, histograms for distributions, and summaries for percentile calculations.

Metric Naming Convention

The Control Plane follows these naming conventions for metrics:

Component: Component name (e.g., file_manager, api_operator)
Area: Functional area (e.g., http, database)
Measurement: Specific measurement (e.g., requests_total, latency_seconds)

Full format: component_area_measurement

Examples:

file_manager_http_requests_total
api_operator_reconcile_duration_seconds
gateway_router_cache_hits_total

Middleware Integration

Automated instrumentation

Middleware integration automatically collects metrics for all HTTP requests without modifying business logic code.

The Control Plane integrates Prometheus metrics with Gofiber middleware.

Service Monitors

Prometheus Operator

The Control Plane uses the Prometheus Operator pattern with ServiceMonitor resources to declaratively define monitoring targets.

The Control Plane uses Prometheus Operator's ServiceMonitor resources to configure scraping.

ServiceMonitor Configuration

Key settings for ServiceMonitor endpoints:

port: Named port to scrape (usually "metrics")
path: HTTP path for metrics (default "/metrics")
interval: Scrape frequency (e.g., "15s")
scheme: HTTP or HTTPS for scraping
tlsConfig: Optional TLS settings for secure endpoints
bearerTokenSecret: For authenticated endpoints

How ServiceMonitors select targets:

selector: Label selector to find Services
namespaceSelector: Controls which namespaces to search
matchNames: Explicit namespace list
anyNamespace: Search all namespaces when true
sampleLimit: Maximum samples per scrape

Metrics Endpoint

Standard exposure

Each Control Plane service exposes a standard /metrics endpoint that follows Prometheus exposition format for consistent scraping.

Grafana Dashboards

Visualization

The Control Plane includes pre-built Grafana dashboards that visualize key metrics for operational monitoring and troubleshooting.

Best Practices

Metric design principles

Following consistent metric design principles helps ensure that metrics are useful, maintainable, and don't cause performance issues.

Follow the component_subsystem_metric naming pattern with clear, descriptive names that indicate what's being measured.

Use labels for dimensions that are useful for filtering and grouping, but avoid high-cardinality labels that can cause performance issues.

Instrument what matters; focus on key performance indicators rather than collecting everything possible.

Document all metrics with clear descriptions, units, and example queries to aid in dashboard creation and alerting.

Design alerts that trigger on symptoms (user impact) rather than causes, with clear runbooks for remediation.

Use rates and averages over appropriate time windows to avoid noise from normal fluctuations in metrics.

Learn about the structured logging used alongside Prometheus metrics.

Understand how monitoring fits into the overall Control Plane architecture.

Discover how the Control Plane builds on Kubernetes observability patterns.

Prometheus Metrics

Comprehensive observability

Overview​

📊 Performance Monitoring

💻 Resource Utilization

🎯 SLO/SLI Measurements

🔍 Operational Insights

⚠️ Alerting

📈 Trend Analysis

Why Prometheus?​

🎯 Precise Targeting

🔍 Multi-dimensional Data

📊 Powerful Query Language

⚡ Stand-alone Operation

📈 Time Series Focus

🧩 Ecosystem Integration

Integration in the Control Plane​

Standardized metrics collection

Service Instrumentation

Operator Metrics

Metric Types​

Choosing the right metric type

Metric Naming Convention​

Control Plane Metric Naming

Middleware Integration​

Automated instrumentation

Service Monitors​

Prometheus Operator

ServiceMonitor Configuration​

Endpoint Configuration

Target Selection

Metrics Endpoint​

Standard exposure

Grafana Dashboards​

Visualization

Best Practices​

Metric design principles

📊 Consistent Naming

🏷️ Meaningful Labels

⚖️ Metric Selection

📋 Documentation

⚠️ Alert Design

🔄 Rate Calculations

Related Resources​

Zap Logging

Architecture

Kubernetes

Overview

Why Prometheus?

Integration in the Control Plane

Metric Types

Metric Naming Convention

Middleware Integration

Service Monitors

ServiceMonitor Configuration

Metrics Endpoint

Grafana Dashboards

Best Practices

Related Resources