Nextcloud MCP Server

observability.md•7.4 KiB

# Observability and Monitoring The Nextcloud MCP Server includes comprehensive observability features for production deployments: - **Prometheus metrics** for monitoring performance and health - **OpenTelemetry distributed tracing** for debugging request flows - **Structured JSON logging** with trace correlation - **Kubernetes integration** via ServiceMonitor and PrometheusRule ## Quick Start ### Local Development with Prometheus ```bash # Enable metrics (enabled by default) export METRICS_ENABLED=true export METRICS_PORT=9090 # Enable tracing (optional - tracing is enabled when OTEL_EXPORTER_OTLP_ENDPOINT is set) export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # Start the server docker-compose up -d mcp ``` Access metrics at: `http://localhost:9090/metrics` ### Kubernetes Deployment Metrics are automatically scraped if you have Prometheus Operator installed: ```bash helm install nextcloud-mcp charts/nextcloud-mcp-server \ --set observability.metrics.enabled=true \ --set observability.tracing.enabled=true \ --set observability.tracing.endpoint=http://opentelemetry-collector:4317 \ --set serviceMonitor.enabled=true ``` ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `METRICS_ENABLED` | `true` | Enable Prometheus metrics | | `METRICS_PORT` | `9090` | Port for metrics endpoint | | `OTEL_EXPORTER_OTLP_ENDPOINT` | - | OTLP gRPC endpoint (e.g., `http://otel-collector:4317`). Tracing is enabled when this is set. | | `OTEL_SERVICE_NAME` | `nextcloud-mcp-server` | Service name in traces | | `OTEL_TRACES_SAMPLER` | `always_on` | Trace sampling strategy | | `OTEL_TRACES_SAMPLER_ARG` | `1.0` | Sampling rate (0.0-1.0) | | `LOG_FORMAT` | `json` | Log format (`json` or `text`) | | `LOG_LEVEL` | `INFO` | Minimum log level | | `LOG_INCLUDE_TRACE_CONTEXT` | `true` | Include trace IDs in logs | ### Helm Chart Configuration ```yaml observability: metrics: enabled: true port: 9090 path: /metrics tracing: enabled: true endpoint: "http://opentelemetry-collector:4317" samplingRate: 1.0 logging: format: json level: INFO includeTraceContext: true serviceMonitor: enabled: true interval: 30s scrapeTimeout: 10s ``` ## Metrics ### HTTP Server Metrics (RED) - `mcp_http_requests_total` - Total HTTP requests - `mcp_http_request_duration_seconds` - Request latency histogram - `mcp_http_requests_in_progress` - In-flight requests gauge ### MCP Tool Metrics - `mcp_tool_calls_total` - Tool invocation count by status - `mcp_tool_duration_seconds` - Tool execution latency - `mcp_tool_errors_total` - Tool errors by type ### Nextcloud API Metrics - `mcp_nextcloud_api_requests_total` - API calls by app and status - `mcp_nextcloud_api_duration_seconds` - API latency by app - `mcp_nextcloud_api_retries_total` - Retry count (429, timeout, etc.) ### OAuth Flow Metrics - `mcp_oauth_token_validations_total` - Token validation count - `mcp_oauth_token_exchange_total` - Token exchange operations - `mcp_oauth_token_cache_hits_total` - Cache hit/miss rate - `mcp_oauth_refresh_token_operations_total` - Refresh token storage ops ### Vector Sync Metrics (when enabled) - `mcp_vector_sync_documents_scanned_total` - Documents discovered - `mcp_vector_sync_documents_processed_total` - Processing results - `mcp_vector_sync_processing_duration_seconds` - Processing latency - `mcp_vector_sync_queue_size` - Current queue depth - `mcp_qdrant_operations_total` - Qdrant DB operations ### Database Metrics - `mcp_db_operations_total` - DB operations (SQLite, Qdrant) - `mcp_db_operation_duration_seconds` - DB latency ### Dependency Health - `mcp_dependency_health` - External dependency status (1=up, 0=down) - `mcp_dependency_check_duration_seconds` - Health check latency ## Distributed Tracing ### Span Hierarchy ``` HTTP POST /messages ├── mcp.tool.nc_notes_create_note │ └── nextcloud.api.notes.POST │ └── httpx request (auto-instrumented) └── oauth.token.validate (if OAuth mode) └── httpx request to IdP ``` ### Span Attributes - **MCP tools**: `mcp.tool.name`, `mcp.tool.args` (sanitized) - **Nextcloud API**: `nextcloud.app`, `http.method`, `http.status_code` - **OAuth**: `oauth.operation`, `oauth.method` - **Vector sync**: `vector_sync.operation`, `vector_sync.document_count` ### Trace Context in Logs When tracing is enabled, all logs include `trace_id` and `span_id`: ```json { "timestamp": "2025-01-09T12:34:56.789Z", "level": "INFO", "logger": "nextcloud_mcp_server.server.notes", "message": "Note created successfully", "trace_id": "a1b2c3d4e5f6...", "span_id": "123456789abc...", "note_id": 42 } ``` ## Dashboards ### Prometheus Queries **Request Rate (req/s)**: ```promql sum(rate(mcp_http_requests_total[5m])) by (method, endpoint) ``` **Error Rate (%)**: ```promql sum(rate(mcp_http_requests_total{status_code=~"5.."}[5m])) / sum(rate(mcp_http_requests_total[5m])) * 100 ``` **P95 Latency**: ```promql histogram_quantile(0.95, sum(rate(mcp_http_request_duration_seconds_bucket[5m])) by (le, endpoint) ) ``` **Top Tools by Volume**: ```promql topk(10, sum(rate(mcp_tool_calls_total[5m])) by (tool_name)) ``` **Nextcloud API Health**: ```promql sum(rate(mcp_nextcloud_api_requests_total{status_code!~"2.."}[5m])) by (app) ``` ## Alerts ### Recommended Alert Rules **Critical**: - Server down for >5min - Error rate >5% for >5min - P95 latency >1s for >5min - Dependency down for >2min **Warning**: - Token validation errors >1% for >10min - Vector sync queue >100 for >15min - Qdrant slow (p95 >500ms) for >10min See `charts/nextcloud-mcp-server/templates/prometheusrule.yaml` for complete definitions. ## Troubleshooting ### Metrics Not Appearing 1. Check metrics are enabled: `curl http://localhost:9090/metrics` 2. Verify ServiceMonitor labels match Prometheus selector 3. Check Prometheus target status: `http://prometheus:9090/targets` ### Traces Not Appearing 1. Verify OTLP endpoint is reachable: `curl http://otel-collector:4317` 2. Check collector logs for errors 3. Verify sampling rate is not 0.0 4. Check trace backend (Jaeger/Tempo) connectivity ### High Cardinality Metrics If you see cardinality warnings: - Middleware normalizes endpoints (e.g., `/user/123` → `/user/*`) - OAuth tokens are never included in metric labels - User IDs are not tracked (use tracing for per-user debugging) ## Performance Impact - **Metrics**: <1% overhead (counters/histograms are very fast) - **Tracing**: ~2-5% overhead at 100% sampling - **JSON logging**: <1% overhead vs text logging **Recommendation**: Always enable metrics. Enable tracing in staging/production with 10-50% sampling. ## Architecture The observability stack integrates at multiple layers: 1. **HTTP Layer**: `ObservabilityMiddleware` tracks all HTTP requests 2. **MCP Layer**: Tools use `@instrument_tool` for automatic metrics and trace span creation 3. **Client Layer**: `BaseNextcloudClient` tracks all API calls 4. **OAuth Layer**: Token operations are traced and metered 5. **Background Tasks**: Vector sync operations emit metrics/traces All components use shared Prometheus `Registry` and OpenTelemetry `TracerProvider`. ## References - [Prometheus Best Practices](https://prometheus.io/docs/practices/) - [OpenTelemetry Python SDK](https://opentelemetry.io/docs/languages/python/) - [Prometheus Operator](https://prometheus-operator.dev/) - [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards/)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cbcoutinho/nextcloud-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

observability.md•7.4 KiB