TailscaleMCP

tailscale-mcp
docs
monitoring

Loki.md•11.7 KiB

# Loki Configuration and Log Analysis Documentation This document provides comprehensive information about Loki configuration, log aggregation, and analysis in the Tailscale MCP monitoring stack. ## 📝 Loki Overview Loki serves as the log aggregation and analysis layer of the monitoring stack, providing centralized log collection, storage, and querying capabilities. ### Key Features - **Log aggregation** from multiple sources - **Powerful query language** (LogQL) for log analysis - **Efficient storage** with compression and indexing - **Real-time log streaming** with live tail capabilities - **Integration** with Prometheus and Grafana - **Scalable architecture** with horizontal scaling ## 🔧 Configuration ### Basic Configuration **Access**: http://localhost:3100 **Storage**: Filesystem-based **Schema**: v11 **Retention**: 200 hours **Query Engine**: LogQL ### Configuration File Structure ```yaml # monitoring/loki/loki.yml auth_enabled: false server: http_listen_port: 3100 grpc_listen_port: 9096 common: path_prefix: /loki replication_factor: 1 ring: instance_addr: 127.0.0.1 kvstore: store: inmemory storage: filesystem: directory: /loki/data retention_period: 200h schema_config: configs: - from: 2020-10-27 store: boltdb-shipper object_store: filesystem schema: v11 period: 24h ``` ## 📊 Log Collection ### Log Sources #### Tailscale MCP Server Logs **Source**: Application logs from the MCP server **Format**: Structured JSON logs **Location**: `/app/logs/tailscale-mcp*.log` **Log Structure**: ```json { "timestamp": "2024-01-01T12:00:00Z", "level": "INFO", "logger": "tailscalemcp.mcp_server", "event": "Starting Tailscale MCP Server", "version": "2.0.0", "host": "0.0.0.0", "port": 8080, "tailnet": "example.tailnet", "prometheus_port": 9091, "log_file": "logs/tailscale-mcp.log" } ``` #### Docker Container Logs **Source**: Docker container logs **Format**: Docker log format **Location**: Docker container stdout/stderr **Log Structure**: ```json { "log": "2024-01-01T12:00:00Z [INFO] Starting server", "stream": "stdout", "time": "2024-01-01T12:00:00Z" } ``` #### System Logs **Source**: System logs from the host **Format**: System log format **Location**: `/var/log/*.log` ### Log Parsing #### Promtail Configuration **File**: `monitoring/promtail/promtail.yml` **Pipeline Stages**: 1. **Log Discovery**: Find log files 2. **Log Parsing**: Parse log format 3. **Label Extraction**: Extract labels 4. **Log Shipping**: Send to Loki #### JSON Log Parsing ```yaml pipeline_stages: - json: expressions: timestamp: timestamp level: level logger: logger message: event version: version host: host port: port tailnet: tailnet prometheus_port: prometheus_port log_file: log_file error: error operation: operation device_id: device_id device_name: device_name transfer_id: transfer_id file_path: file_path recipient: recipient status: status duration: duration bytes_sent: bytes_sent bytes_received: bytes_received - timestamp: source: timestamp format: RFC3339 - labels: level: logger: version: host: port: tailnet: prometheus_port: log_file: error: operation: device_id: device_name: transfer_id: file_path: recipient: status: duration: bytes_sent: bytes_received: - output: source: message ``` ## 🔍 LogQL Queries ### Basic Queries #### Simple Log Selection ```logql # All Tailscale MCP logs {job="tailscale-mcp"} # Logs with specific level {job="tailscale-mcp", level="ERROR"} # Logs with specific logger {job="tailscale-mcp", logger="tailscalemcp.mcp_server"} # Logs with specific tailnet {job="tailscale-mcp", tailnet="example.tailnet"} ``` #### Text Filtering ```logql # Logs containing specific text {job="tailscale-mcp"} |= "device" # Logs containing regex pattern {job="tailscale-mcp"} |~ "device.*authorize" # Logs not containing specific text {job="tailscale-mcp"} != "debug" # Logs not matching regex pattern {job="tailscale-mcp"} !~ "debug.*" ``` ### Advanced Queries #### Log Aggregation ```logql # Count logs by level sum(count_over_time({job="tailscale-mcp"}[1h])) by (level) # Rate of logs by level sum(rate({job="tailscale-mcp"}[5m])) by (level) # Top error messages topk(10, sum by (message) (count_over_time({job="tailscale-mcp", level="ERROR"}[1h]))) # Log volume over time sum(rate({job="tailscale-mcp"}[5m])) ``` #### Label-based Queries ```logql # Logs by device {job="tailscale-mcp", device_name="server-01"} # Logs by operation {job="tailscale-mcp", operation="authorize"} # Logs by status {job="tailscale-mcp", status="success"} # Logs by error type {job="tailscale-mcp", error="timeout"} ``` #### Time-based Queries ```logql # Logs from last hour {job="tailscale-mcp"}[1h] # Logs from specific time range {job="tailscale-mcp"}[2024-01-01T12:00:00Z:2024-01-01T13:00:00Z] # Logs with time range and filtering {job="tailscale-mcp", level="ERROR"}[1h] ``` ### Complex Queries #### Multi-condition Queries ```logql # Device operations with errors {job="tailscale-mcp"} |= "device" and |= "error" # Network operations with specific status {job="tailscale-mcp"} |= "network" and status="success" # File transfer operations with duration {job="tailscale-mcp"} |= "taildrop" and duration > 10 ``` #### Aggregation with Filtering ```logql # Error rate by device sum(rate({job="tailscale-mcp", level="ERROR"}[5m])) by (device_name) # Operation count by type sum(count_over_time({job="tailscale-mcp"}[1h])) by (operation) # Log volume by logger sum(rate({job="tailscale-mcp"}[5m])) by (logger) ``` ## 📊 Log Analysis ### Common Log Patterns #### Device Operations ```logql # Device authorization logs {job="tailscale-mcp"} |= "device" and |= "authorize" # Device connection logs {job="tailscale-mcp"} |= "device" and |= "connect" # Device status changes {job="tailscale-mcp"} |= "device" and |= "status" ``` #### Network Operations ```logql # Network traffic logs {job="tailscale-mcp"} |= "network" and |= "traffic" # Network connectivity logs {job="tailscale-mcp"} |= "network" and |= "connectivity" # Network performance logs {job="tailscale-mcp"} |= "network" and |= "performance" ``` #### File Transfer Operations ```logql # Taildrop transfer logs {job="tailscale-mcp"} |= "taildrop" and |= "transfer" # File upload logs {job="tailscale-mcp"} |= "taildrop" and |= "upload" # File download logs {job="tailscale-mcp"} |= "taildrop" and |= "download" ``` #### Security Events ```logql # Authentication logs {job="tailscale-mcp"} |= "auth" and |= "login" # Authorization logs {job="tailscale-mcp"} |= "auth" and |= "authorize" # Security violation logs {job="tailscale-mcp"} |= "security" and |= "violation" ``` ### Error Analysis #### Error Classification ```logql # All errors {job="tailscale-mcp", level="ERROR"} # Errors by type {job="tailscale-mcp", level="ERROR"} |~ "timeout|connection|auth" # Errors by device {job="tailscale-mcp", level="ERROR"} and device_name != "" # Errors by operation {job="tailscale-mcp", level="ERROR"} and operation != "" ``` #### Error Trends ```logql # Error rate over time sum(rate({job="tailscale-mcp", level="ERROR"}[5m])) # Error count by hour sum(count_over_time({job="tailscale-mcp", level="ERROR"}[1h])) by (hour) # Top error sources topk(10, sum by (logger) (count_over_time({job="tailscale-mcp", level="ERROR"}[1h]))) ``` ## 🚨 Log-based Alerting ### Alert Rules #### High Error Rate Alert ```yaml - alert: HighErrorRate expr: sum(rate({job="tailscale-mcp", level="ERROR"}[5m])) > 0.1 for: 2m labels: severity: warning service: tailscale-mcp annotations: summary: "High error rate detected in logs" description: "Error rate is above 0.1 errors per second for more than 2 minutes" ``` #### Critical Error Alert ```yaml - alert: CriticalError expr: count_over_time({job="tailscale-mcp", level="ERROR"} |= "critical"[5m]) > 5 for: 1m labels: severity: critical service: tailscale-mcp annotations: summary: "Critical errors detected" description: "More than 5 critical errors in the last 5 minutes" ``` #### Service Down Alert ```yaml - alert: ServiceDown expr: absent_over_time({job="tailscale-mcp"}[5m]) for: 1m labels: severity: critical service: tailscale-mcp annotations: summary: "Service appears to be down" description: "No logs received from Tailscale MCP service for more than 5 minutes" ``` ## 🛠️ Configuration Management ### Environment Variables ```bash # Loki configuration LOKI_RETENTION_PERIOD=200h LOKI_STORAGE_PATH=/loki/data LOKI_HTTP_LISTEN_PORT=3100 LOKI_GRPC_LISTEN_PORT=9096 # Storage configuration LOKI_STORAGE_TYPE=filesystem LOKI_STORAGE_FILESYSTEM_DIRECTORY=/loki/data # Query configuration LOKI_QUERY_TIMEOUT=60s LOKI_QUERY_MAX_CONCURRENCY=20 ``` ### Configuration Files #### Main Configuration - **File**: `monitoring/loki/loki.yml` - **Purpose**: Main Loki configuration - **Includes**: Server settings, storage configuration, schema settings #### Promtail Configuration - **File**: `monitoring/promtail/promtail.yml` - **Purpose**: Log collection and shipping configuration - **Includes**: Log sources, parsing pipelines, shipping configuration ## 📈 Performance Optimization ### Query Optimization #### Efficient Queries ```logql # Good: Use specific label selectors {job="tailscale-mcp", level="ERROR"} # Bad: Don't use broad selectors without filtering {job="tailscale-mcp"} # Good: Use appropriate time ranges {job="tailscale-mcp", level="ERROR"}[1h] # Bad: Don't use very long time ranges for frequent queries {job="tailscale-mcp", level="ERROR"}[7d] ``` #### Index Usage ```logql # Good: Use indexed labels {job="tailscale-mcp", level="ERROR", logger="tailscalemcp.mcp_server"} # Bad: Don't use non-indexed text filters without labels {job="tailscale-mcp"} |= "error" |= "critical" ``` ### Storage Optimization #### Retention Policies ```yaml # Short-term retention for high-volume logs - name: short_term duration: 24h size_limit: 1GB # Long-term retention for important logs - name: long_term duration: 7d size_limit: 10GB ``` #### Data Compression ```yaml # Enable data compression storage: filesystem: directory: /loki/data compression: true compression_level: 1 ``` ## 🔧 Troubleshooting ### Common Issues #### Log Collection Failures - Check Promtail configuration - Verify log file permissions - Check network connectivity to Loki - Verify log format parsing #### High Storage Usage - Reduce log retention period - Enable log compression - Filter out unnecessary logs - Use log sampling for high-volume logs #### Slow Queries - Optimize query syntax - Use appropriate time ranges - Limit result sets - Use label selectors instead of text filters ### Debugging Steps 1. **Check Log Sources**: Verify logs are being generated 2. **Check Promtail**: Verify log collection is working 3. **Check Loki**: Verify logs are being received 4. **Check Queries**: Validate query syntax and results 5. **Check Storage**: Monitor storage usage and performance ## 📚 Additional Resources ### Documentation - [Loki Documentation](https://grafana.com/docs/loki/latest/) - [LogQL Query Language](https://grafana.com/docs/loki/latest/logql/) - [Promtail Configuration](https://grafana.com/docs/loki/latest/clients/promtail/) ### Community - [Loki Community](https://community.grafana.com/c/loki) - [Loki GitHub](https://github.com/grafana/loki) - [Loki Slack](https://grafana.slack.com/channels/loki) --- This documentation provides comprehensive guidance for using Loki in the Tailscale MCP monitoring stack, from basic configuration to advanced log analysis and troubleshooting.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sandraschi/tailscale-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Loki.md•11.7 KiB