Skip to main content
Glama

OpenTelemetry MCP Server

OpenTelemetry MCP Server

šŸ”§ Provide AI agents with tooling to query Prometheus metrics and Loki logs for intelligent alert investigation and troubleshooting

Overview

otel-mcp is a Python-based MCP (Model Context Protocol) server that acts as a bridge between AI agents and your observability backends (Prometheus & Loki). When alerts fire or issues arise, AI agents can use this server to query metrics and logs to help identify root causes and assist on-call engineers.

Key Features

  • šŸ” Flexible Querying: Both raw PromQL/LogQL queries and high-level helper tools

  • šŸ”Ž Service Discovery: Auto-discover metrics, labels, and services

  • šŸ” Flexible Auth: Optional auth for backends (Basic, Bearer) + OIDC for MCP server

  • ⚔ Simple Config: Environment variable based configuration

  • šŸ Python-based: Fast to develop and easy to maintain

Architecture

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ AI Agent │ Investigating alerts/issues ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ MCP Protocol │ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā–¼ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ otel-mcp │ Query translation & tooling ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā”œā”€ā”€ā–ŗ Prometheus (metrics) └──► Loki (logs)

Use Cases

1. Alert Investigation

When Alertmanager fires an alert, AI agent can:

  • Query recent metrics to understand the issue

  • Search logs for error patterns

  • Correlate metrics and logs

  • Suggest potential root causes

2. On-Call Support

Engineers working through issues can ask AI to:

  • "Show me CPU metrics for api-server in last hour"

  • "Find all errors in payment-service logs"

  • "What services are currently monitored?"

3. Service Discovery

  • List all available metrics and services

  • Discover what's being monitored

  • Explore label dimensions

Quick Start

Prerequisites

  • Python 3.10+

  • Access to Prometheus and/or Loki instances

  • MCP-compatible AI client (Claude Desktop, etc.)

Installation

# Clone repository git clone https://github.com/yourusername/otel-mcp.git cd otel-mcp # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Setup configuration cp .env.example .env # Edit .env with your Prometheus/Loki endpoints

Configuration

Create .env file:

# Prometheus PROMETHEUS_URL=http://localhost:9090 PROMETHEUS_AUTH_TYPE=none # PROMETHEUS_USERNAME=admin # PROMETHEUS_PASSWORD=secret # Loki LOKI_URL=http://localhost:3100 LOKI_AUTH_TYPE=none # LOKI_BEARER_TOKEN=your-token # MCP Server (optional) MCP_AUTH_ENABLED=false # MCP_OIDC_ISSUER=https://your-idp.com # MCP_OIDC_CLIENT_ID=otel-mcp # Settings LOG_LEVEL=INFO DEFAULT_TIME_RANGE=1h

Running

# Development mode python -m src.server # Or with uvicorn (if using async server) uvicorn src.server:app

Using with Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{ "mcpServers": { "otel": { "command": "python", "args": ["-m", "src.server"], "cwd": "/path/to/otel-mcp", "env": { "PROMETHEUS_URL": "http://localhost:9090", "LOKI_URL": "http://localhost:3100" } } } }

Available Tools

Prometheus Tools

Tool

Description

query_prometheus

Execute raw PromQL queries

query_prometheus_range

Query over time range

get_metric_current_value

Get current metric value (helper)

get_metric_over_time

Get metric trend (helper)

list_metrics

Discover available metrics

list_labels

List label names

list_label_values

Get values for a label

Loki Tools

Tool

Description

query_loki

Execute raw LogQL queries

search_logs

Search logs with filters (helper)

get_log_stats

Get aggregated log statistics

list_log_labels

List log stream labels

list_log_label_values

Get label values

Cross-Cutting Tools

Tool

Description

correlate_metrics_logs

Get both metrics and logs for a service

See DESIGN.md for complete tool specifications.

Example Usage

Example 1: Investigate High CPU Alert

You: We got an alert that api-server CPU is high. Can you investigate? AI Agent uses: 1. get_metric_current_value(metric="cpu_usage_percent", filters={"service": "api-server"}) 2. get_metric_over_time(metric="cpu_usage_percent", filters={"service": "api-server"}, time_range="1h") 3. search_logs(service="api-server", level="error", time_range="30m") AI: CPU has been at 85% for the last 20 minutes. I found several "OutOfMemory" errors in the logs starting at 10:15 AM, which correlates with the CPU spike...

Example 2: Service Discovery

You: What services are we monitoring? AI Agent uses: 1. list_label_values(label="service") AI: You're currently monitoring 12 services: api-server, payment-service, auth-service, database-proxy...

Example 3: Raw Query

You: Show me the error rate for all services in the last hour AI Agent uses: 1. query_prometheus_range( query='rate(http_requests_total{status=~"5.."}[5m])', start="1h", step="1m" ) AI: Here are the error rates... payment-service has the highest at 15 errors/min...

Authentication

Backend Authentication (Prometheus/Loki)

Three modes supported:

No Auth:

PROMETHEUS_AUTH_TYPE=none

Basic Auth:

PROMETHEUS_AUTH_TYPE=basic PROMETHEUS_USERNAME=admin PROMETHEUS_PASSWORD=secret

Bearer Token:

PROMETHEUS_AUTH_TYPE=bearer PROMETHEUS_BEARER_TOKEN=your-token-here

MCP Server Authentication (Optional OIDC)

MCP_AUTH_ENABLED=true MCP_OIDC_ISSUER=https://your-idp.com MCP_OIDC_CLIENT_ID=otel-mcp-server MCP_OIDC_AUDIENCE=otel-mcp

See DESIGN.md for details.

Development

Project Structure

otel-mcp/ ā”œā”€ā”€ src/ │ ā”œā”€ā”€ server.py # MCP server entry point │ ā”œā”€ā”€ config.py # Configuration │ ā”œā”€ā”€ auth/ # Auth handlers │ ā”œā”€ā”€ backends/ # Prometheus/Loki clients │ ā”œā”€ā”€ tools/ # MCP tools implementation │ └── utils/ # Helpers ā”œā”€ā”€ tests/ │ ā”œā”€ā”€ unit/ │ ā”œā”€ā”€ integration/ │ └── e2e/ ā”œā”€ā”€ requirements.txt ā”œā”€ā”€ .env.example ā”œā”€ā”€ README.md └── DESIGN.md

Running Tests

# Unit tests pytest tests/unit/ # Integration tests (requires Docker) docker-compose -f tests/integration/docker-compose.yml up -d pytest tests/integration/ docker-compose -f tests/integration/docker-compose.yml down # All tests pytest

Code Quality

# Format black src/ tests/ # Lint ruff check src/ tests/ # Type check mypy src/

Roadmap

  • Technical design

  • Phase 1 (MVP): Prometheus basic tools + discovery

  • Phase 2: Loki integration

  • Phase 3: High-level helper tools

  • Phase 4: OIDC auth + production features

  • Phase 5: Trace support (Tempo/Jaeger)

See DESIGN.md for detailed roadmap.

Contributing

Contributions welcome! This is a hobby open-source project.

  1. Fork the repo

  2. Create a feature branch (git checkout -b feature/amazing-feature)

  3. Make your changes

  4. Run tests and linting

  5. Commit (git commit -m 'Add amazing feature')

  6. Push and create a PR

License

MIT License - see LICENSE for details.

Support & Discussion


Built for the observability and AI communities šŸš€

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/agarwalvivek29/opentelemetry-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server