Skip to main content
Glama
sandraschi

Observability MCP Server

OPENTELEMETRY_GUIDE.mdโ€ข15.1 kB
# OpenTelemetry Complete Guide **Understanding OpenTelemetry: The Industry Standard for Observability** --- ## ๐Ÿ“‹ **Table of Contents** - [What is OpenTelemetry?](#-what-is-opentelemetry) - [Organization & Governance](#-organization--governance) - [Industry Acceptance](#-industry-acceptance) - [Core Components](#-core-components) - [Installation & Setup](#-installation--setup) - [Basic Usage](#-basic-usage) - [Advanced Features](#-advanced-features) - [Integration Examples](#-integration-examples) - [Best Practices](#-best-practices) - [Resources & Community](#-resources--community) --- ## ๐ŸŒŸ **What is OpenTelemetry?** **OpenTelemetry** is an open-source observability framework for generating, collecting, and exporting telemetry data (metrics, logs, and traces) from software applications. ### **Key Characteristics:** - **Vendor-neutral** - Works with any backend - **Language-agnostic** - Supports 11+ programming languages - **Future-proof** - Designed for cloud-native and microservices - **Standards-based** - Built on W3C and industry standards ### **Three Pillars of Observability:** 1. **Metrics** - Numerical measurements over time (CPU usage, response times) 2. **Logs** - Timestamped records of events 3. **Traces** - Request journey through distributed systems --- ## ๐Ÿข **Organization & Governance** ### **Governing Body** - **Name:** OpenTelemetry Community (formerly OpenCensus + OpenTracing) - **Founded:** 2019 (merged from two projects) - **Governance:** CNCF (Cloud Native Computing Foundation) ### **Corporate Backing** **Founding Companies:** - **Google** (OpenCensus contributor) - **Uber** (Jaeger tracing) - **LightStep** (OpenTracing founder) **Current Major Contributors:** - **Microsoft** - **AWS** - **Google Cloud** - **Datadog** - **New Relic** - **Splunk** - **Honeycomb** - **LightStep** ### **CNCF Status** - **Incubation:** May 2019 - **Graduation:** August 2021 - **Current Status:** Graduated CNCF Project --- ## ๐ŸŒ **Industry Acceptance** ### **Adoption Statistics (2024)** - **11 Programming Languages** officially supported - **40+ SDKs and Libraries** - **Major Cloud Providers** - AWS, GCP, Azure all support OTLP - **Monitoring Vendors** - 90%+ support OpenTelemetry - **Enterprise Users** - Netflix, Shopify, eBay, Uber, etc. ### **Industry Standards** - **OTLP (OpenTelemetry Protocol)** - Official wire protocol - **W3C Trace Context** - HTTP header standard - **Semantic Conventions** - Standardized naming - **CNCF Graduation** - Highest maturity level ### **Market Leadership** - **Defacto Standard** for observability data collection - **Kubernetes Native** - Built into K8s ecosystem - **Cloud-Native** - Designed for microservices and containers --- ## ๐Ÿ“š **Core Components** ### **1. APIs & SDKs** ```python # Python SDK Example from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import ConsoleSpanExporter # Setup trace.set_tracer_provider(TracerProvider()) tracer = trace.get_tracer(__name__) # Usage with tracer.start_as_span("operation") as span: span.set_attribute("key", "value") # Your code here ``` ### **2. Collectors** - **Purpose:** Receive, process, and export telemetry data - **Deployment:** Agent or Gateway modes - **Processors:** Batch, filter, transform data - **Exporters:** Send to backends (Prometheus, Jaeger, etc.) ### **3. Protocol (OTLP)** - **HTTP/JSON** - RESTful API - **gRPC** - High-performance binary protocol - **Protobuf** - Efficient serialization --- ## ๐Ÿš€ **Installation & Setup** ### **Python Installation** ```bash # Install core packages pip install opentelemetry-distro pip install opentelemetry-instrumentation pip install opentelemetry-exporter-prometheus pip install opentelemetry-sdk ``` ### **Auto-Instrumentation** ```bash # Install auto-instrumentation pip install opentelemetry-distro pip install opentelemetry-instrumentation-flask # For Flask apps pip install opentelemetry-instrumentation-requests # For HTTP calls # Run with auto-instrumentation opentelemetry-instrumentation --traces_exporter console flask run ``` ### **Manual Setup** ```python from opentelemetry import metrics, trace from opentelemetry.sdk.metrics import MeterProvider from opentelemetry.sdk.trace import TracerProvider # Metrics setup metrics.set_meter_provider(MeterProvider()) # Tracing setup trace.set_tracer_provider(TracerProvider()) # Get instruments meter = metrics.get_meter("my-service") tracer = trace.get_tracer("my-service") ``` --- ## ๐Ÿ“Š **Basic Usage** ### **Metrics** ```python from opentelemetry import metrics meter = metrics.get_meter("example-service") # Counter - monotonically increasing value requests_counter = meter.create_counter( name="requests_total", description="Total number of requests", unit="1" ) # Histogram - distribution of values response_time_histogram = meter.create_histogram( name="response_time_seconds", description="Response time in seconds", unit="s" ) # UpDownCounter - can increase or decrease active_connections = meter.create_up_down_counter( name="active_connections", description="Number of active connections", unit="1" ) # Usage requests_counter.add(1, {"method": "GET", "endpoint": "/api"}) response_time_histogram.record(0.142, {"method": "GET"}) active_connections.add(1) # New connection active_connections.add(-1) # Connection closed ``` ### **Tracing** ```python from opentelemetry import trace tracer = trace.get_tracer("example-service") # Create spans with tracer.start_as_span("http_request") as span: span.set_attribute("http.method", "GET") span.set_attribute("http.url", "https://api.example.com/users") # Nested spans with tracer.start_as_span("database_query") as db_span: db_span.set_attribute("db.statement", "SELECT * FROM users") # Database operation here # Add events span.add_event("request_started") # ... request processing ... span.add_event("request_completed") ``` ### **Logs Integration** ```python import logging from opentelemetry.instrumentation.logging import LoggingInstrumentor # Enable logging instrumentation LoggingInstrumentor().instrument() # Structured logging logger = logging.getLogger(__name__) logger.info("User login", extra={ "user_id": 12345, "login_method": "oauth" }) ``` --- ## ๐Ÿ”ง **Advanced Features** ### **Context Propagation** ```python from opentelemetry import context, trace # Get current context current_context = context.get_current() # Create child context token = context.attach(context.set_value("custom_key", "custom_value")) try: # Operations in new context with tracer.start_as_span("operation") as span: # Span inherits context pass finally: context.detach(token) ``` ### **Baggage (Cross-Service Data)** ```python from opentelemetry import baggage # Set baggage (propagates across services) token = baggage.set_baggage("user_id", "12345") token2 = baggage.set_baggage("tenant_id", "abc123") try: # Baggage available in child spans and services user_id = baggage.get_baggage("user_id") print(f"Current user: {user_id}") finally: baggage.remove_baggage_by_key("user_id") baggage.remove_baggage_by_key("tenant_id") ``` ### **Custom Instrumentation** ```python from opentelemetry.trace import Status, StatusCode class CustomInstrumentor: def instrument(self): # Monkey patch or wrap functions original_function = some_module.some_function def instrumented_function(*args, **kwargs): with tracer.start_as_span("custom_operation") as span: try: result = original_function(*args, **kwargs) span.set_status(Status(StatusCode.OK)) return result except Exception as e: span.set_status(Status(StatusCode.ERROR, str(e))) span.record_exception(e) raise some_module.some_function = instrumented_function def uninstrument(self): # Restore original function pass ``` --- ## ๐Ÿ”— **Integration Examples** ### **FastAPI Integration** ```python from fastapi import FastAPI from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor from opentelemetry import trace from opentelemetry.exporter.jaeger import JaegerExporter from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor # Setup tracing trace.set_tracer_provider(TracerProvider()) tracer = trace.get_tracer(__name__) # Configure Jaeger exporter jaeger_exporter = JaegerExporter( agent_host_name="localhost", agent_port=6831, ) span_processor = BatchSpanProcessor(jaeger_exporter) trace.get_tracer_provider().add_span_processor(span_processor) app = FastAPI() # Instrument FastAPI FastAPIInstrumentor.instrument_app(app) @app.get("/users/{user_id}") async def get_user(user_id: int): with tracer.start_as_span("get_user") as span: span.set_attribute("user.id", user_id) # Your logic here return {"user_id": user_id, "name": "John Doe"} ``` ### **Database Integration** ```python import psycopg2 from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor # Instrument psycopg2 Psycopg2Instrumentor().instrument() # Usage (automatically instrumented) conn = psycopg2.connect("dbname=test user=postgres") cursor = conn.cursor() cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,)) ``` ### **HTTP Client Integration** ```python import requests from opentelemetry.instrumentation.requests import RequestsInstrumentor # Instrument requests RequestsInstrumentor().instrument() # Usage (automatically creates spans) response = requests.get("https://api.example.com/users") ``` --- ## ๐ŸŽฏ **Best Practices** ### **Naming Conventions** ```python # Good naming meter.create_counter( name="http_requests_total", description="Total number of HTTP requests" ) # Avoid meter.create_counter( name="requests", # Too vague description="Requests" # Too brief ) ``` ### **Resource Attributes** ```python from opentelemetry.sdk.resources import Resource resource = Resource.create({ "service.name": "user-service", "service.version": "1.0.0", "service.instance.id": "instance-123", "environment": "production" }) trace.set_tracer_provider(TracerProvider(resource=resource)) ``` ### **Error Handling** ```python with tracer.start_as_span("operation") as span: try: # Risky operation result = risky_function() span.set_status(Status(StatusCode.OK)) return result except ValueError as e: span.set_status(Status(StatusCode.ERROR, str(e))) span.record_exception(e) raise except Exception as e: span.set_status(Status(StatusCode.ERROR, "Unexpected error")) span.record_exception(e) raise ``` ### **Performance Considerations** - **Sampling** - Don't trace everything in production - **Batching** - Use batch processors for efficiency - **Async Export** - Don't block application threads - **Resource Limits** - Set memory and CPU limits --- ## ๐ŸŒ **Resources & Community** ### **Official Websites** - **Main Site:** https://opentelemetry.io/ - **Documentation:** https://opentelemetry.io/docs/ - **Blog:** https://opentelemetry.io/blog/ ### **GitHub Repositories** - **Core Repository:** https://github.com/open-telemetry/opentelemetry-specification - **Python SDK:** https://github.com/open-telemetry/opentelemetry-python - **Collector:** https://github.com/open-telemetry/opentelemetry-collector - **Community:** https://github.com/open-telemetry/community ### **Package Registries** - **PyPI:** https://pypi.org/project/opentelemetry-sdk/ - **npm:** https://www.npmjs.com/package/@opentelemetry/api - **Go Modules:** https://pkg.go.dev/go.opentelemetry.io/otel ### **Community Resources** - **Slack:** https://cloud-native.slack.com/ (#opentelemetry) - **Twitter:** https://twitter.com/opentelemetry - **YouTube:** https://www.youtube.com/c/OpenTelemetry - **Mailing List:** cncf-opentelemetry-community@lists.cncf.io ### **Conferences & Events** - **KubeCon + CloudNativeCon** - Annual flagship event - **OpenTelemetry Community Days** - Regional meetups - **CNCF Online Programs** - Regular webinars and talks ### **Learning Resources** - **Getting Started Guide:** https://opentelemetry.io/docs/concepts/what-is-opentelemetry/ - **Zero to Hero:** https://opentelemetry.io/docs/zero-code/ - **Instrumentation Guides:** https://opentelemetry.io/docs/instrumentation/ - **Semantic Conventions:** https://opentelemetry.io/docs/reference/specification/semantic_conventions/ --- ## ๐Ÿ“ˆ **Real-World Usage Examples** ### **Netflix - Large Scale Tracing** ```python # Netflix uses OpenTelemetry for microservices observability # Millions of traces per minute # Custom sampling strategies # Integration with Atlas metrics ``` ### **Shopify - E-commerce Monitoring** ```python # Real-time performance monitoring # Business metrics correlation # Custom dashboards and alerts ``` ### **Uber - Ride Sharing Platform** ```python # Distributed tracing across services # Performance optimization # Incident response automation ``` --- ## ๐Ÿ”ฎ **Future of OpenTelemetry** ### **Roadmap Highlights** - **Logs API/SDK** - Native logging support (2024) - **Profiles** - CPU/memory profiling integration - **eBPF Integration** - Kernel-level observability - **WASM Support** - WebAssembly instrumentation ### **Emerging Standards** - **OTLP/HTTP** - RESTful protocol adoption - **Metrics API v1** - Stable metrics specification - **Semantic Conventions v1.20.0** - Standardized naming --- ## โ“ **Common Questions** ### **Q: OpenTelemetry vs. Prometheus?** **A:** Complementary! OpenTelemetry collects and generates metrics, Prometheus stores and queries them. ### **Q: OpenTelemetry vs. Jaeger?** **A:** OpenTelemetry generates traces, Jaeger stores and visualizes them. ### **Q: Is OpenTelemetry production ready?** **A:** Yes! Graduated CNCF project used by major companies. ### **Q: How much overhead does it add?** **A:** Minimal (<5% CPU, <10MB memory) with proper configuration. ### **Q: Can I migrate from existing tools?** **A:** Yes! OpenTelemetry supports multiple export formats. --- ## ๐ŸŽฏ **Quick Start Checklist** - [ ] Install OpenTelemetry SDK for your language - [ ] Set up basic tracing or metrics - [ ] Configure an exporter (console for dev, production backend for prod) - [ ] Add resource attributes (service name, version, environment) - [ ] Instrument your application code - [ ] Set up sampling rules for production - [ ] Configure alerting and monitoring - [ ] Join the community for support --- **OpenTelemetry is the future of observability** - vendor-neutral, language-agnostic, and designed for cloud-native applications. Start small, instrument gradually, and scale as needed! *Last updated: December 2024 | OpenTelemetry v1.28.0*

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/observability-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server