Enables visualization of MCP ecosystem health and performance through dashboard integration, consuming metrics from the Prometheus data source.
Provides distributed tracing and metrics collection for MCP server ecosystems, enabling monitoring of requests across multiple services with context propagation and structured performance data collection.
Exports comprehensive metrics including health checks, performance data, system resources, traces, and alerts in Prometheus format for collection, alerting, and time-series analysis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Observability MCP Servershow me the health status of all my MCP servers"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Observability MCP Server
FastMCP 2.14.1-powered observability server for monitoring MCP ecosystems
A comprehensive observability server built on FastMCP 2.14.1 that leverages OpenTelemetry integration, persistent storage, and advanced monitoring capabilities to provide production-grade observability for MCP server ecosystems.
π Features
FastMCP 2.14.1 Integration
β OpenTelemetry Integration - Distributed tracing and metrics collection
β Enhanced Storage Backend - Persistent metrics and historical data
β Production-Ready - Built for high-performance monitoring
Comprehensive Monitoring
π Real-time Health Checks - Monitor MCP server availability and response times
π Performance Metrics - CPU, memory, disk, and network monitoring
π Distributed Tracing - Track interactions across MCP server ecosystems
π¨ Intelligent Alerting - Anomaly detection and automated alerts
π Performance Reports - Automated analysis and optimization recommendations
Advanced Analytics
π¬ Usage Pattern Analysis - Understand how MCP servers are being used
π Trend Detection - Identify performance trends and bottlenecks
π― Optimization Insights - Data-driven recommendations for improvement
π€ Multi-Format Export - Prometheus, OpenTelemetry, and JSON export
π οΈ Installation
Prerequisites
Python 3.11+
FastMCP 2.14.1+ (automatically installed)
Install from Source
git clone https://github.com/sandraschi/observability-mcp
cd observability-mcp
pip install -e .Docker Installation
docker build -t observability-mcp .
docker run -p 9090:9090 observability-mcpπ Quick Start
1. Start the Server
# Using the CLI
observability-mcp run
# Or directly with Python
python -m observability_mcp.server2. Verify Installation
# Check server health
observability-mcp health
# View available metrics
observability-mcp metrics3. Configure Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"observability": {
"command": "observability-mcp",
"args": ["run"]
}
}
}π Available Tools
π Health Monitoring
monitor_server_health- Real-time health checks with OpenTelemetry metricsmonitor_system_resources- Comprehensive system resource monitoring
π Performance Analysis
collect_performance_metrics- CPU, memory, disk, and network metricsgenerate_performance_reports- Automated performance analysis and recommendationsanalyze_mcp_interactions- Usage pattern analysis and optimization insights
π¨ Alerting & Anomaly Detection
alert_on_anomalies- Intelligent anomaly detection and alertingtrace_mcp_calls- Distributed tracing for MCP server interactions
π€ Data Export
export_metrics- Export metrics in Prometheus, OpenTelemetry, or JSON formats
π§ Configuration
Environment Variables
# Prometheus metrics server port
PROMETHEUS_PORT=9090
# OpenTelemetry service name
OTEL_SERVICE_NAME=observability-mcp
# OTLP exporter endpoint (optional)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Metrics retention period (days)
METRICS_RETENTION_DAYS=30Alert Configuration
The server comes with pre-configured alerts for common issues:
CPU Usage > 90% (Warning)
Memory Usage > 1GB (Error)
Error Rate > 5% (Error)
Alerts are stored persistently and can be customized through the MCP tools.
π Monitoring Dashboard
Prometheus Metrics
Access metrics at: http://localhost:9090/metrics
Available metrics:
# Health checks
mcp_health_checks_total{status="healthy|degraded|unhealthy", service="..."} 1
# Performance metrics
mcp_performance_metrics_collected{service="..."} 1
# System resources
mcp_cpu_usage_percent{} 45.2
mcp_memory_usage_mb{} 1024.5
# Traces and alerts
mcp_traces_created{service="...", operation="..."} 1
mcp_alerts_triggered{type="active|anomaly"} 1Integration with Grafana
Add Prometheus as a data source in Grafana
Import the provided dashboard JSON
Visualize your MCP ecosystem's health and performance
ποΈ Architecture
FastMCP 2.14.1 Features Leveraged
OpenTelemetry Integration
Distributed Tracing: Track requests across multiple MCP servers
Metrics Collection: Structured performance data collection
Context Propagation: Maintain context across service boundaries
Enhanced Persistent Storage
Historical Data: Store metrics and traces for trend analysis
Cross-Session Persistence: Data survives server restarts
Efficient Storage: Optimized for time-series data
Production Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β MCP Servers βββββΆβ Observability βββββΆβ Prometheus β
β (Monitored) β β MCP Server β β Metrics β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ βββββββββββββββββββ
β Persistent β β Grafana β
β Storage β β Dashboard β
ββββββββββββββββββββ βββββββββββββββββββπ Usage Examples
Health Monitoring
# Check MCP server health
result = await monitor_server_health(
service_url="http://localhost:8000/health",
timeout_seconds=5.0
)
print(f"Status: {result['health_check']['status']}")Performance Analysis
# Collect system metrics
metrics = await collect_performance_metrics(service_name="my-mcp-server")
print(f"CPU: {metrics['metrics']['cpu_percent']}%")
print(f"Memory: {metrics['metrics']['memory_mb']} MB")Distributed Tracing
# Record a trace
trace = await trace_mcp_calls(
operation_name="process_document",
service_name="ocr-mcp",
duration_ms=150.5,
attributes={"file_size": "2.3MB", "format": "PDF"}
)Generate Reports
# Create performance report
report = await generate_performance_reports(
service_name="web-mcp",
days=7
)
print("Performance Summary:", report['summary'])
print("Recommendations:", report['recommendations'])π§ Development
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=observability_mcp --cov-report=htmlCode Quality
# Format code
black src/
# Lint code
ruff check src/
# Type checking
mypy src/Docker Development
# Build development image
docker build -t observability-mcp:dev -f Dockerfile.dev .
# Run with hot reload
docker run -p 9090:9090 -v $(pwd):/app observability-mcp:devπ Performance Benchmarks
FastMCP 2.14.1 Benefits
OpenTelemetry Overhead: <1ms per trace
Storage Performance: 1000+ metrics/second
Memory Usage: 50MB baseline + 10MB per monitored service
Concurrent Monitoring: 100+ services simultaneously
Recommended Hardware
CPU: 2+ cores for metrics processing
RAM: 2GB minimum, 4GB recommended
Storage: 10GB for metrics history (30 days retention)
π¨ Troubleshooting
Common Issues
Server Won't Start
# Check Python version
python --version # Should be 3.11+
# Check FastMCP installation
pip show fastmcp # Should be 2.14.1+
# Check dependencies
pip checkMetrics Not Appearing
# Check Prometheus endpoint
curl http://localhost:9090/metrics
# Verify OpenTelemetry configuration
observability-mcp metricsHigh Memory Usage
Reduce
METRICS_RETENTION_DAYSImplement metric aggregation
Monitor with
monitor_system_resources
Storage Issues
Check available disk space
Clean old metrics:
rm -rf ~/.observability-mcp/metrics/*Restart server to recreate storage
π€ Contributing
Development Setup
Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request
Code Standards
FastMCP 2.14.1+: Use latest features and patterns
OpenTelemetry: Follow OTEL best practices
Async First: All operations should be async
Type Hints: Full type coverage required
Documentation: Comprehensive docstrings
Testing Strategy
Unit Tests: Core functionality
Integration Tests: MCP server interactions
Performance Tests: Benchmarking and load testing
Chaos Tests: Failure scenario testing
π License
MIT License - see LICENSE file for details.
π Acknowledgments
FastMCP Team - For the amazing 2.14.1 framework with OpenTelemetry integration
OpenTelemetry Community - For the observability standards and tools
Prometheus Team - For the metrics collection and alerting system
π Related Projects
FastMCP - The framework this server is built on
OpenTelemetry Python - Observability instrumentation
Prometheus - Metrics collection and alerting
Grafana - Visualization and dashboards
Built with β€οΈ using FastMCP 2.14.1 and OpenTelemetry