Observability MCP Server
Enables visualization of MCP ecosystem health and performance through dashboard integration, consuming metrics from the Prometheus data source.
Provides distributed tracing and metrics collection for MCP server ecosystems, enabling monitoring of requests across multiple services with context propagation and structured performance data collection.
Exports comprehensive metrics including health checks, performance data, system resources, traces, and alerts in Prometheus format for collection, alerting, and time-series analysis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Observability MCP Servershow me the health status of all my MCP servers"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Observability MCP Server
FastMCP 3.1.0-powered observability server for monitoring MCP ecosystems
A comprehensive observability server built on FastMCP 3.1.0 that leverages OpenTelemetry integration, persistent storage, and advanced monitoring capabilities to provide production-grade observability for MCP server ecosystems. Features state-of-the-art Grafana dashboards for visualization, Loki for centralized log aggregation, and Prometheus for metrics collection.
Features
FastMCP 3.1.0 Integration
OpenTelemetry Integration - Distributed tracing and metrics collection
Enhanced Storage Backend - Persistent metrics and historical data
Production-Ready - Built for high-performance monitoring
Comprehensive Monitoring
Real-time Health Checks - Monitor MCP server availability and response times
Performance Metrics - CPU, memory, disk, and network monitoring with Prometheus
Distributed Tracing - Track interactions across MCP server ecosystems
Centralized Logging - Loki-powered log aggregation and querying
Intelligent Alerting - Anomaly detection and automated alerts
Performance Reports - Automated analysis and optimization recommendations
Advanced Analytics
Usage Pattern Analysis - Understand how MCP servers are being used
Trend Detection - Identify performance trends and bottlenecks
Log Correlation - Correlate metrics with Loki logs for root cause analysis
Optimization Insights - Data-driven recommendations for improvement
Multi-Format Export - Prometheus, Loki, OpenTelemetry, and JSON export
Installation
Prerequisites
uv installed (RECOMMENDED)
Python 3.12+
Quick Start
Run immediately via uvx:
uvx observability-mcpClaude Desktop Integration
Add to your claude_desktop_config.json:
"mcpServers": {
"observability-mcp": {
"command": "uv",
"args": ["--directory", "D:/Dev/repos/observability-mcp", "run", "observability-mcp"]
}
}Prerequisites
Python 3.11+
FastMCP 3.1.0+ (automatically installed)
Install from Source
git clone https://github.com/sandraschi/observability-mcp
cd observability-mcp
uv pip install -e .Installation
Prerequisites
uv installed (RECOMMENDED)
Python 3.12+
Quick Start
Run immediately via uvx:
uvx observability-mcpClaude Desktop Integration
Add to your claude_desktop_config.json:
"mcpServers": {
"observability-mcp": {
"command": "uv",
"args": ["--directory", "D:/Dev/repos/observability-mcp", "run", "observability-mcp"]
}
}Quick Start
1. Start the Server
# Using the CLI
observability-mcp run
# Or directly with Python
python -m observability_mcp.serverInstallation
Prerequisites
uv installed (RECOMMENDED)
Python 3.12+
Quick Start
Run immediately via uvx:
uvx observability-mcpClaude Desktop Integration
Add to your claude_desktop_config.json:
"mcpServers": {
"observability-mcp": {
"command": "uv",
"args": ["--directory", "D:/Dev/repos/observability-mcp", "run", "observability-mcp"]
}
}3. Configure Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"observability": {
"command": "observability-mcp",
"args": ["run"]
}
}
}Available Tools
Health Monitoring
monitor_server_health- Real-time health checks with OpenTelemetry metricsmonitor_system_resources- Comprehensive system resource monitoring
Performance Analysis
collect_performance_metrics- CPU, memory, disk, and network metricsgenerate_performance_reports- Automated performance analysis and recommendationsanalyze_mcp_interactions- Usage pattern analysis and optimization insights
Log Management & Loki Integration
send_logs_to_loki- Send custom log entries to Loki for centralized aggregationquery_loki_logs- Query logs from Loki with advanced LogQL filteringanalyze_log_patterns- Analyze log patterns, anomalies, and trendscorrelate_logs_and_metrics- Correlate Loki logs with Prometheus metrics
Alerting & Anomaly Detection
alert_on_anomalies- Intelligent anomaly detection and alertingtrace_mcp_calls- Distributed tracing for MCP server interactions
Data Export
export_metrics- Export metrics in Prometheus, OpenTelemetry, or JSON formats
Configuration
Environment Variables
# Prometheus metrics server port
PROMETHEUS_PORT=9090
# Loki configuration
LOKI_URL=http://localhost:3100
LOG_FILE=/tmp/observability-mcp.log
# OpenTelemetry service name
OTEL_SERVICE_NAME=observability-mcp
# OTLP exporter endpoint (optional)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Metrics retention period (days)
METRICS_RETENTION_DAYS=30Alert Configuration
The server comes with pre-configured alerts for common issues:
CPU Usage > 90% (Warning)
Memory Usage > 1GB (Error)
Error Rate > 5% (Error)
Alerts are stored persistently and can be customized through the MCP tools.
Monitoring Dashboard
Prometheus Metrics
Access metrics at: http://localhost:9090/metrics
Available metrics:
# Health checks
mcp_health_checks_total{status="healthy|degraded|unhealthy", service="..."} 1
# Performance metrics
mcp_performance_metrics_collected{service="..."} 1
# System resources
mcp_cpu_usage_percent{} 45.2
mcp_memory_usage_mb{} 1024.5
# Traces and alerts
mcp_traces_created{service="...", operation="..."} 1
mcp_alerts_triggered{type="active|anomaly"} 1Integration with Grafana & Loki
Grafana Dashboards are State-of-the-Art for Observability
Add Data Sources in Grafana:
Add Prometheus as a data source (http://localhost:9090)
Add Loki as a data source (http://localhost:3100)
Import Dashboards:
Import the provided
mcp-observability.jsondashboardCustomize panels for your specific MCP ecosystem
Log Integration:
Query logs with Loki:
{job="observability-mcp"} |= "ERROR"Correlate metrics with logs for comprehensive troubleshooting
Why Grafana + Loki = SOTA Observability:
Unified View: Single pane of glass for metrics, logs, and traces
Powerful Queries: PromQL + LogQL for complex analysis
Rich Visualizations: State-of-the-art dashboards with real-time updates
Alert Integration: Native alerting with multiple notification channels
Architecture
FastMCP 3.1.0 Features Leveraged
OpenTelemetry Integration
Distributed Tracing: Track requests across multiple MCP servers
Metrics Collection: Structured performance data collection
Context Propagation: Maintain context across service boundaries
Enhanced Persistent Storage
Historical Data: Store metrics and traces for trend analysis
Cross-Session Persistence: Data survives server restarts
Efficient Storage: Optimized for time-series data
Production Architecture
MCP Servers Observability Prometheus
(Monitored) MCP Server Metrics
Persistent Grafana
Storage Dashboards
(State-of-Art)
Application Loki
Logs Log Aggregation
Usage Examples
Health Monitoring
# Check MCP server health
result = await monitor_server_health(
service_url="http://localhost:8000/health",
timeout_seconds=5.0
)
print(f"Status: {result['health_check']['status']}")Performance Analysis
# Collect system metrics
metrics = await collect_performance_metrics(service_name="my-mcp-server")
print(f"CPU: {metrics['metrics']['cpu_percent']}%")
print(f"Memory: {metrics['metrics']['memory_mb']} MB")Distributed Tracing
# Record a trace
trace = await trace_mcp_calls(
operation_name="process_document",
service_name="ocr-mcp",
duration_ms=150.5,
attributes={"file_size": "2.3MB", "format": "PDF"}
)Generate Reports
# Create performance report
report = await generate_performance_reports(
service_name="web-mcp",
days=7
)
print("Performance Summary:", report['summary'])
print("Recommendations:", report['recommendations'])Loki Log Management
# Send custom logs to Loki
result = await send_logs_to_loki(
log_message="User authentication failed",
level="warning",
labels={"service": "auth-service", "user_id": "12345"}
)
# Query logs from Loki
logs = await query_loki_logs(
query='{job="observability-mcp"} |= "ERROR"',
start_time="1h",
limit=100
)
# Analyze log patterns
patterns = await analyze_log_patterns(
query='{service="web-mcp"}',
time_window="24h",
min_occurrences=10
)
# Correlate logs with metrics
correlation = await correlate_logs_and_metrics(
log_query='{service="api"} |= "timeout"',
metric_query="rate(http_requests_total{status='500'}[5m])",
time_window="1h"
)Development
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=observability_mcp --cov-report=htmlCode Quality
# Format code
black src/
# Lint code
ruff check src/
# Type checking
mypy src/Docker Development
# Build development image
docker build -t observability-mcp:dev -f Dockerfile.dev .
# Run with hot reload
docker run -p 9090:9090 -v $(pwd):/app observability-mcp:devPerformance Benchmarks
FastMCP 3.1.0 Benefits
OpenTelemetry Overhead: <1ms per trace
Storage Performance: 1000+ metrics/second
Memory Usage: 50MB baseline + 10MB per monitored service
Concurrent Monitoring: 100+ services simultaneously
Recommended Hardware
CPU: 2+ cores for metrics processing
RAM: 2GB minimum, 4GB recommended
Storage: 10GB for metrics history (30 days retention)
Troubleshooting
Common Issues
Server Won't Start
# Check Python version
python --version # Should be 3.11+
# Check FastMCP installation
pip show fastmcp # Should be 2.14.1+
# Check dependencies
pip checkMetrics Not Appearing
# Check Prometheus endpoint
curl http://localhost:9090/metrics
# Verify OpenTelemetry configuration
observability-mcp metricsHigh Memory Usage
Reduce
METRICS_RETENTION_DAYSImplement metric aggregation
Monitor with
monitor_system_resources
Storage Issues
Check available disk space
Clean old metrics:
rm -rf ~/.observability-mcp/metrics/*Restart server to recreate storage
Contributing
Development Setup
Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request
Code Standards
FastMCP 3.1.0+: Use latest features and patterns
OpenTelemetry: Follow OTEL practices
Async First: All operations should be async
Type Hints: Full type coverage required
Documentation: Comprehensive docstrings
Testing Strategy
Unit Tests: Core functionality
Integration Tests: MCP server interactions
Performance Tests: Benchmarking and load testing
Chaos Tests: Failure scenario testing
🛡️ Industrial Quality Stack
This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:
Python (Core): Ruff for linting and formatting. Zero-tolerance for
printstatements in core handlers (T201).Webapp (UI): Biome for sub-millisecond linting. Strict
noConsoleLogenforcement.Protocol Compliance: Hardened
stdout/stderrisolation to ensure crash-resistant JSON-RPC communication.Automation: Justfile recipes for all fleet operations (
just lint,just fix,just dev).Security: Automated audits via
banditandsafety.
License
MIT License - see LICENSE file for details.
Acknowledgments
FastMCP Team - For the 2.14.1 framework with OpenTelemetry integration
OpenTelemetry Community - For the observability standards and tools
Prometheus Team - For the metrics collection and alerting system
Grafana Labs - For Loki log aggregation and Grafana's state-of-the-art dashboarding
Grafana Community - For the visualization platform that powers modern observability
Related Projects
FastMCP - The framework this server is built on
OpenTelemetry Python - Observability instrumentation
Prometheus - Metrics collection and alerting
Grafana - State-of-the-art dashboards and visualization
Loki - Log aggregation and querying
Promtail - Log shipping agent
Built with using FastMCP 3.1.0, OpenTelemetry, Prometheus, Grafana & Loki - State-of-the-Art Observability
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/observability-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server