Session Buddy

PROMETHEUS_EXPORTER_IMPLEMENTATION.md•10.5 KiB

# Prometheus Metrics Exporter - Implementation Summary ## Overview Implemented a complete Prometheus metrics exporter for Session-Buddy Phase 4, enabling monitoring systems (Prometheus, Grafana) to scrape and visualize skill metrics in real-time. ## File Created **`/Users/les/Projects/session-buddy/session_buddy/realtime/metrics_exporter.py`** ## Features Implemented ### 1. Prometheus Metrics All metrics follow Prometheus best practices with proper labeling and documentation: #### **Counter: skill_invocations_total** - Tracks total number of skill invocations - Labels: `skill_name`, `workflow_phase`, `completed` - Monotonically increasing count #### **Histogram: skill_duration_seconds** - Tracks skill execution duration distribution - Labels: `skill_name`, `workflow_phase` - Buckets: [0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0, 60.0, 300.0] seconds - Provides sum, count, and bucket counts #### **Gauge: skill_completion_rate** - Tracks current skill completion rate - Labels: `skill_name` - Values: 0.0 to 1.0 #### **Gauge: active_sessions_total** - Tracks number of currently active sessions - No labels (single global value) #### **Counter: anomalies_detected_total** - Tracks detected anomalies - Labels: `anomaly_type`, `skill_name` - Anomaly types: "drop", "spike", "pattern_shift" ### 2. PrometheusExporter Class ```python class PrometheusExporter: """Export skills metrics to Prometheus format.""" def __init__(self, port: int = 9090) -> None: """Initialize Prometheus exporter.""" def start(self) -> None: """Start Prometheus metrics HTTP server.""" def record_invocation( self, skill_name: str, workflow_phase: str | None, completed: bool, duration_seconds: float | None, ) -> None: """Record a skill invocation.""" def update_completion_rate(self, skill_name: str, rate: float) -> None: """Update skill completion rate gauge.""" def record_anomaly(self, anomaly_type: str, skill_name: str) -> None: """Record detected anomaly.""" def update_active_sessions(self, count: int) -> None: """Update active sessions gauge.""" def is_running(self) -> bool: """Check if exporter is currently running.""" ``` ### 3. Key Features - **Thread-safe**: All metrics can be updated concurrently from multiple threads/agents - **Idempotent start()**: Safe to call multiple times - **Input validation**: Port range validation, rate bounds checking - **Debug logging**: Comprehensive logging for monitoring - **Prometheus compliant**: Proper metric types, labels, and HELP/TYPE metadata ## Usage Examples ### Basic Usage ```python from session_buddy.realtime import PrometheusExporter # Create and start exporter exporter = PrometheusExporter(port=9090) exporter.start() # Record skill invocation exporter.record_invocation( skill_name="pytest-run", workflow_phase="execution", completed=True, duration_seconds=45.2 ) # Update completion rate exporter.update_completion_rate("pytest-run", 0.92) # Record anomaly exporter.record_anomaly("performance_drop", "pytest-run") # Update active sessions exporter.update_active_sessions(5) # Metrics available at: http://localhost:9090/metrics ``` ### Integration with SkillsStorage ```python from session_buddy.realtime import PrometheusExporter from session_buddy.storage import SkillsStorage # Initialize exporter = PrometheusExporter(port=9090) exporter.start() storage = SkillsStorage(db_path=".session-buddy/skills.db") # Record invocation when skill called def track_skill_invocation(skill_name: str, completed: bool, duration: float): # Store in database storage.record_invocation( skill_name=skill_name, workflow_phase="execution", completed=completed, duration_seconds=duration, session_id="session-123", workflow_path=None ) # Export to Prometheus exporter.record_invocation( skill_name=skill_name, workflow_phase="execution", completed=completed, duration_seconds=duration ) ``` ### Integration with WebSocket Server ```python from session_buddy.realtime import PrometheusExporter, RealTimeMetricsServer # Start both servers exporter = PrometheusExporter(port=9090) exporter.start() ws_server = RealTimeMetricsServer( host="localhost", port=8765, db_path=Path(".session-buddy/skills.db") ) await ws_server.start() # Periodically update Prometheus metrics from database async def update_prometheus_metrics(): while True: # Get top skills from database top_skills = await asyncio.to_thread( ws_server.storage.get_top_skills, limit=10 ) # Update Prometheus gauges for skill in top_skills: exporter.update_completion_rate( skill.skill_name, skill.completion_rate ) await asyncio.sleep(60) # Update every minute ``` ## Prometheus Output Format The `/metrics` endpoint produces standard Prometheus format: ``` # HELP skill_invocations_total Total number of skill invocations # TYPE skill_invocations_total counter skill_invocations_total{skill_name="pytest-run",workflow_phase="execution",completed="true"} 142.0 skill_invocations_total{skill_name="ruff-check",workflow_phase="setup",completed="true"} 89.0 # HELP skill_duration_seconds Skill execution duration in seconds # TYPE skill_duration_seconds histogram skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="0.1"} 0.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="0.5"} 2.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="1.0"} 5.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="2.0"} 12.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="5.0"} 45.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="10.0"} 89.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="30.0"} 120.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="60.0"} 138.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="300.0"} 142.0 skill_duration_seconds_bucket{skill_name="pytest-run",workflow_phase="execution",le="+Inf"} 142.0 skill_duration_seconds_sum{skill_name="pytest-run",workflow_phase="execution"} 4532.5 skill_duration_seconds_count{skill_name="pytest-run",workflow_phase="execution"} 142.0 # HELP skill_completion_rate Current skill completion rate # TYPE skill_completion_rate gauge skill_completion_rate{skill_name="pytest-run"} 0.92 skill_completion_rate{skill_name="ruff-check"} 0.87 skill_completion_rate{skill_name="mypy-check"} 0.95 # HELP active_sessions_total Number of currently active sessions # TYPE active_sessions_total gauge active_sessions_total 3.0 # HELP anomalies_detected_total Total number of anomalies detected # TYPE anomalies_detected_total counter anomalies_detected_total{anomaly_type="performance_drop",skill_name="pytest-run"} 5.0 anomalies_detected_total{anomaly_type="spike",skill_name="ruff-check"} 2.0 ``` ## Testing ### Test Script A test script is provided at `/Users/les/Projects/session-buddy/test_prometheus_metrics.py`: ```bash cd /Users/les/Projects/session-buddy python test_prometheus_metrics.py ``` This will: 1. Start Prometheus exporter on port 9092 2. Record sample metrics 3. Display expected Prometheus output format 4. Keep server running for manual testing 5. Visit http://localhost:9092/metrics to see actual output ### Manual Testing ```bash # Run the demo python session_buddy/realtime/metrics_exporter.py # In another terminal, scrape metrics curl http://localhost:9090/metrics ``` ## Prometheus Configuration To scrape these metrics, add to your `prometheus.yml`: ```yaml scrape_configs: - job_name: 'session-buddy' scrape_interval: 15s static_configs: - targets: ['localhost:9090'] ``` ## Grafana Dashboard Queries Example PromQL queries for Grafana dashboards: ### Skill Completion Rate ```promql skill_completion_rate{skill_name="pytest-run"} ``` ### Average Skill Duration (p95) ```promql histogram_quantile(0.95, sum(rate(skill_duration_seconds_bucket[5m])) by (skill_name, le) ) ``` ### Skills With Most Anomalies ```promql topk(10, sum(anomalies_detected_total) by (skill_name)) ``` ### Invocation Rate ```promql sum(rate(skill_invocations_total[5m])) by (skill_name) ``` ### Active Sessions Over Time ```promql active_sessions_total ``` ## Dependencies **Already in pyproject.toml:** - `prometheus-client>=0.24.1` (line 37) No additional dependencies needed! ## Integration Points ### 1. WebSocket Server (`websocket_server.py`) - Update metrics when broadcasting to clients - Track active sessions via client connections ### 2. SkillsTracker (future) - Record invocations as they happen - Update completion rates periodically ### 3. Anomaly Detection (future) - Record anomalies when detected - Track deviation scores ## Architecture Compliance ✅ **Protocol-based design**: Clean interfaces with clear contracts ✅ **Thread-safe**: Safe for concurrent updates ✅ **Error handling**: Input validation and clear error messages ✅ **Logging**: Comprehensive debug logging ✅ **Documentation**: Full docstrings with examples ✅ **Type hints**: Complete type annotations ✅ **Testing**: Example script provided ✅ **No dependencies conflict**: Uses existing prometheus-client ## Next Steps 1. **Integration**: Connect exporter to WebSocket server for periodic metric updates 2. **Dashboard**: Create Grafana dashboard for skill metrics visualization 3. **Alerts**: Set up Prometheus alerts for anomaly detection 4. **Monitoring**: Configure Prometheus to scrape metrics endpoint 5. **Documentation**: Update Session-Buddy docs with monitoring setup instructions ## Files - **Implementation**: `/Users/les/Projects/session-buddy/session_buddy/realtime/metrics_exporter.py` - **Test Script**: `/Users/les/Projects/session-buddy/test_prometheus_metrics.py` - **This Document**: `/Users/les/Projects/session-buddy/PROMETHEUS_EXPORTER_IMPLEMENTATION.md` ## Summary ✅ **Prometheus metrics exporter fully implemented** ✅ **All 5 metric types defined (Counter x2, Histogram, Gauge x2)** ✅ **PrometheusExporter class with complete API** ✅ **Thread-safe and production-ready** ✅ **Test script provided for validation** ✅ **No additional dependencies required** ✅ **Integration examples documented** The exporter is ready to use and can be integrated into the Session-Buddy workflow for real-time monitoring and alerting.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lesleslie/session-buddy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

PROMETHEUS_EXPORTER_IMPLEMENTATION.md•10.5 KiB