Prometheus Metrics
Prometheus Metrics
Real-time monitoring and metrics collection for the Control Plane
The Control Plane uses Prometheus for monitoring and metrics collection.
Comprehensive observability
Prometheus enables real-time monitoring of Control Plane services, providing insights into performance, resource usage, and operational health through a robust metrics collection system.
Overview
Prometheus is an open-source systems monitoring and alerting toolkit. The Control Plane integrates Prometheus for:
📊 Performance Monitoring
Track response times, throughput, and error rates across all services.
💻 Resource Utilization
Monitor CPU, memory, and network usage for capacity planning.
🎯 SLO/SLI Measurements
Measure service level objectives and indicators for reliability.
🔍 Operational Insights
Gain visibility into system behavior for troubleshooting.
⚠️ Alerting
Trigger notifications when metrics exceed defined thresholds.
📈 Trend Analysis
Track changes in system behavior over time for planning.
Why Prometheus?
🎯 Precise Targeting
Pull-based model allows for fine-grained control over what metrics are collected and when.
🔍 Multi-dimensional Data
Powerful data model with metric names and key/value pairs enables detailed filtering and grouping.
📊 Powerful Query Language
PromQL provides flexible querying capabilities for real-time analysis and visualization.
⚡ Stand-alone Operation
No reliance on distributed storage; self-contained and easy to deploy and operate.
📈 Time Series Focus
Optimized for time series data, making it ideal for monitoring systems metrics over time.
🧩 Ecosystem Integration
Rich ecosystem of exporters, alert managers, and dashboarding tools like Grafana.
Integration in the Control Plane
Standardized metrics collection
The Control Plane implements a consistent metrics strategy across all components, enabling unified monitoring and alerting.
The Control Plane integrates Prometheus throughout its architecture for comprehensive observability:
Service Instrumentation
All Control Plane services expose standardized metrics:
- HTTP request duration, status codes, and rates
- Database connection pools and query times
- Resource utilization and system metrics
- Business-level metrics for domain-specific operations
Operator Metrics
Kubernetes operators expose specialized metrics:
- Reconciliation duration and frequencies
- Resource creation and deletion rates
- Error counts by resource type
- Controller queue depths and processing times
Metric Types
Choosing the right metric type
Select the appropriate metric type based on what you're measuring: counters for events that accumulate, gauges for values that rise and fall, histograms for distributions, and summaries for percentile calculations.
Metric Naming Convention
Control Plane Metric Naming
The Control Plane follows these naming conventions for metrics:
- Component: Component name (e.g.,
file_manager
,api_operator
) - Area: Functional area (e.g.,
http
,database
) - Measurement: Specific measurement (e.g.,
requests_total
,latency_seconds
)
Full format: component_area_measurement
Examples:
file_manager_http_requests_total
api_operator_reconcile_duration_seconds
gateway_router_cache_hits_total
Middleware Integration
Automated instrumentation
Middleware integration automatically collects metrics for all HTTP requests without modifying business logic code.
The Control Plane integrates Prometheus metrics with Gofiber middleware.
Service Monitors
Prometheus Operator
The Control Plane uses the Prometheus Operator pattern with ServiceMonitor resources to declaratively define monitoring targets.
The Control Plane uses Prometheus Operator's ServiceMonitor resources to configure scraping.
ServiceMonitor Configuration
Endpoint Configuration
Key settings for ServiceMonitor endpoints:
- port: Named port to scrape (usually "metrics")
- path: HTTP path for metrics (default "/metrics")
- interval: Scrape frequency (e.g., "15s")
- scheme: HTTP or HTTPS for scraping
- tlsConfig: Optional TLS settings for secure endpoints
- bearerTokenSecret: For authenticated endpoints
Target Selection
How ServiceMonitors select targets:
- selector: Label selector to find Services
- namespaceSelector: Controls which namespaces to search
- matchNames: Explicit namespace list
- anyNamespace: Search all namespaces when true
- sampleLimit: Maximum samples per scrape
Metrics Endpoint
Standard exposure
Each Control Plane service exposes a standard /metrics endpoint that follows Prometheus exposition format for consistent scraping.
Grafana Dashboards
Visualization
The Control Plane includes pre-built Grafana dashboards that visualize key metrics for operational monitoring and troubleshooting.
Best Practices
Metric design principles
Following consistent metric design principles helps ensure that metrics are useful, maintainable, and don't cause performance issues.
📊 Consistent Naming
Follow the component_subsystem_metric naming pattern with clear, descriptive names that indicate what's being measured.
🏷️ Meaningful Labels
Use labels for dimensions that are useful for filtering and grouping, but avoid high-cardinality labels that can cause performance issues.
⚖️ Metric Selection
Instrument what matters; focus on key performance indicators rather than collecting everything possible.
📋 Documentation
Document all metrics with clear descriptions, units, and example queries to aid in dashboard creation and alerting.
⚠️ Alert Design
Design alerts that trigger on symptoms (user impact) rather than causes, with clear runbooks for remediation.
🔄 Rate Calculations
Use rates and averages over appropriate time windows to avoid noise from normal fluctuations in metrics.
Related Resources
Zap Logging
Learn about the structured logging used alongside Prometheus metrics.
Architecture
Understand how monitoring fits into the overall Control Plane architecture.
Kubernetes
Discover how the Control Plane builds on Kubernetes observability patterns.