# Monitoring Stack Architecture
This document describes the architecture and design of the Tailscale MCP monitoring stack.
## šļø System Architecture
```
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Tailscale MCP Monitoring Stack ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā Grafana ā ā Prometheus ā ā Loki ā ā
ā ā (Port 3000) āāāāā⤠(Port 9090) ā ā (Port 3100) ā ā
ā ā ā ā ā ā ā ā
ā ā ⢠Dashboards ā ā ⢠Metrics ā ā ⢠Logs ā ā
ā ā ⢠Alerts ā ā ⢠Rules ā ā ⢠Querying ā ā
ā ā ⢠Visualization ā ā ⢠Storage ā ā ⢠Analysis ā ā
ā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā² ā² ā² ā
ā ā ā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā Promtail ā ā Tailscale MCP ā ā Docker ā ā
ā ā ā ā Server ā ā Containers ā ā
ā ā ⢠Log Shipping ā ā (Port 8080/9091)ā ā ā ā
ā ā ⢠Log Parsing ā ā ā ā ⢠Services ā ā
ā ā ⢠Labeling ā ā ⢠Application ā ā ⢠Networks ā ā
ā ā ā ā ⢠Metrics ā ā ⢠Volumes ā ā
ā ā ā ā ⢠Logs ā ā ā ā
ā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā² ā² ā² ā
ā ā ā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā Docker Network ā ā
ā ā (monitoring) ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
```
## š§ Component Details
### Grafana (Visualization Layer)
**Purpose**: Provides dashboards, visualization, and alerting for the monitoring stack.
**Configuration**:
- **Port**: 3000
- **Authentication**: Admin user (admin/admin)
- **Data Sources**: Prometheus and Loki
- **Dashboards**: 4 comprehensive dashboards
- **Plugins**: Pie chart, worldmap panels
**Key Features**:
- Real-time dashboard updates
- Interactive visualizations
- Alert rule configuration
- Dashboard provisioning
- User management
### Prometheus (Metrics Layer)
**Purpose**: Collects, stores, and queries time-series metrics from the Tailscale MCP server.
**Configuration**:
- **Port**: 9090
- **Scrape Interval**: 15 seconds
- **Evaluation Interval**: 15 seconds
- **Retention**: 200 hours
- **Storage**: Local filesystem
**Metrics Collection**:
- Application metrics from MCP server
- System metrics from containers
- Custom business metrics
- Health check metrics
### Loki (Log Aggregation Layer)
**Purpose**: Aggregates, stores, and queries log data from all components.
**Configuration**:
- **Port**: 3100
- **Storage**: Filesystem-based
- **Schema**: v11
- **Retention**: Configurable
- **Query Engine**: LogQL
**Log Sources**:
- Tailscale MCP server logs
- Docker container logs
- System logs
- Application logs
### Promtail (Log Shipping Layer)
**Purpose**: Collects logs from various sources and ships them to Loki.
**Configuration**:
- **Log Sources**: Files, Docker containers, system logs
- **Parsing**: JSON, regex, timestamp extraction
- **Labeling**: Automatic label assignment
- **Shipping**: HTTP push to Loki
**Pipeline Stages**:
- Log discovery
- Log parsing
- Label extraction
- Log shipping
### Tailscale MCP Server (Application Layer)
**Purpose**: The main application that provides Tailscale management functionality.
**Configuration**:
- **Port**: 8080 (MCP server), 9091 (metrics)
- **Logging**: Structured JSON logging
- **Metrics**: Prometheus metrics exposure
- **Health**: Health check endpoints
**Monitoring Integration**:
- Prometheus metrics endpoint
- Structured logging output
- Health check endpoints
- Performance metrics
## š Data Flow
### Metrics Flow
1. **Collection**: Tailscale MCP server exposes metrics on port 9091
2. **Scraping**: Prometheus scrapes metrics every 15 seconds
3. **Storage**: Metrics stored in Prometheus time-series database
4. **Querying**: Grafana queries Prometheus for dashboard data
5. **Visualization**: Dashboards display real-time metrics
### Log Flow
1. **Generation**: Tailscale MCP server generates structured JSON logs
2. **Collection**: Promtail collects logs from files and containers
3. **Parsing**: Promtail parses logs and extracts labels
4. **Shipping**: Logs shipped to Loki via HTTP
5. **Storage**: Loki stores logs with indexing
6. **Querying**: Grafana queries Loki with LogQL
7. **Display**: Log streams displayed in dashboards
## š Service Dependencies
### Startup Order
1. **Loki** - Log storage (no dependencies)
2. **Prometheus** - Metrics storage (no dependencies)
3. **Promtail** - Log shipping (depends on Loki)
4. **Tailscale MCP Server** - Application (depends on Prometheus)
5. **Grafana** - Visualization (depends on Prometheus and Loki)
### Health Checks
- **Loki**: HTTP health check on port 3100
- **Prometheus**: HTTP health check on port 9090
- **Promtail**: Process health check
- **Tailscale MCP**: HTTP health check on port 9091
- **Grafana**: HTTP health check on port 3000
## š ļø Configuration Management
### Environment Variables
```bash
# Tailscale Configuration
TAILSCALE_API_KEY=your_api_key_here
TAILSCALE_TAILNET=your_tailnet_name_here
# Application Configuration
LOG_LEVEL=INFO
PROMETHEUS_PORT=9091
# Grafana Configuration
GF_SECURITY_ADMIN_PASSWORD=admin
```
### Docker Compose Configuration
- **Services**: All components defined as services
- **Networks**: Custom Docker network for isolation
- **Volumes**: Persistent storage for data
- **Environment**: Environment variable injection
- **Health Checks**: Service health monitoring
### Configuration Files
- **Prometheus**: `monitoring/prometheus/prometheus.yml`
- **Loki**: `monitoring/loki/loki.yml`
- **Promtail**: `monitoring/promtail/promtail.yml`
- **Grafana**: Provisioned via configuration files
## š Scaling Considerations
### Horizontal Scaling
- **Multiple Prometheus instances** for high availability
- **Loki clustering** for log storage scaling
- **Load balancing** for Grafana instances
- **Distributed log collection** with multiple Promtail instances
### Vertical Scaling
- **Memory allocation** for each component
- **CPU allocation** for processing
- **Storage allocation** for data retention
- **Network bandwidth** for data transfer
## š Security Architecture
### Network Security
- **Docker network isolation** between services
- **Internal communication** only
- **External access** through specific ports
- **Firewall rules** for port access
### Authentication
- **Grafana authentication** with admin user
- **API key protection** in environment variables
- **Service-to-service** authentication
- **Access logging** for audit trails
### Data Protection
- **Log sanitization** for sensitive data
- **Metrics filtering** for privacy
- **Encryption** for data at rest
- **Backup encryption** for data protection
## š Monitoring and Alerting
### Health Monitoring
- **Service health checks** for all components
- **Resource usage monitoring** (CPU, memory, disk)
- **Network connectivity monitoring**
- **Performance monitoring** for each service
### Alerting Rules
- **Service down alerts** for critical services
- **Resource usage alerts** for capacity planning
- **Error rate alerts** for application issues
- **Performance degradation alerts** for optimization
### Notification Channels
- **Email notifications** for critical alerts
- **Slack integration** for team notifications
- **Webhook notifications** for custom integrations
- **PagerDuty integration** for on-call management
## š Deployment Architecture
### Development Environment
- **Local Docker Compose** for development
- **Hot reloading** for configuration changes
- **Debug logging** for troubleshooting
- **Test data** for dashboard development
### Production Environment
- **Container orchestration** with Docker Swarm or Kubernetes
- **High availability** with multiple instances
- **Load balancing** for service distribution
- **Monitoring** of the monitoring stack itself
### Staging Environment
- **Production-like setup** for testing
- **Performance testing** with realistic data
- **Integration testing** with external services
- **User acceptance testing** for dashboards
## š Performance Metrics
### Key Performance Indicators
- **Service uptime** for all components
- **Response times** for dashboards and queries
- **Log processing latency** for real-time monitoring
- **Storage utilization** for capacity planning
### Optimization Strategies
- **Query optimization** for faster dashboard loading
- **Log retention policies** for storage management
- **Metrics aggregation** for reduced storage
- **Caching strategies** for improved performance
---
This architecture provides a robust, scalable, and maintainable monitoring solution for the Tailscale MCP server with comprehensive observability capabilities.