# Tailscale MCP Monitoring Stack
This directory contains the complete monitoring stack for the Tailscale MCP server, including Grafana, Prometheus, and Loki for comprehensive observability.
## šļø Architecture
```
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
ā Grafana ā ā Prometheus ā ā Loki ā
ā (Port 3000) āāāāā⤠(Port 9090) ā ā (Port 3100) ā
ā ā ā ā ā ā
ā ⢠Dashboards ā ā ⢠Metrics ā ā ⢠Logs ā
ā ⢠Alerts ā ā ⢠Rules ā ā ⢠Aggregation ā
ā ⢠Visualization ā ā ⢠Storage ā ā ⢠Querying ā
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
ā² ā² ā²
ā ā ā
āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāā
ā Tailscale MCP ā
ā Server ā
ā (Port 8080/9091)ā
ā ā
ā ⢠Application ā
ā ⢠Metrics ā
ā ⢠Logs ā
āāāāāāāāāāāāāāāāāāā
```
## š Quick Start
1. **Copy environment file:**
```powershell
Copy-Item env.example .env
```
2. **Edit `.env` with your Tailscale credentials:**
```
TAILSCALE_API_KEY=your_api_key_here
TAILSCALE_TAILNET=your_tailnet_name_here
```
3. **Start the monitoring stack:**
```powershell
.\scripts\start-monitoring.ps1
```
4. **Access the services:**
- **Grafana**: http://localhost:3000 (admin/admin)
- **RebootX On-Prem API**: http://localhost:9001
- **RebootX Swagger UI**: http://localhost:9002
- **Prometheus**: http://localhost:9090
- **Loki**: http://localhost:3100
- **MCP Server**: http://localhost:8080
## š± RebootX On-Prem Integration
The monitoring stack includes RebootX On-Prem integration, allowing you to monitor and manage your Tailscale infrastructure from the RebootX mobile app.
### Features
- **Mobile Infrastructure Monitoring**: Monitor your infrastructure from your smartphone
- **Remote Management**: Stop, reboot, and manage services remotely
- **Real-time Metrics**: View live metrics and dashboards
- **SSH Access**: Connect to servers directly from the app
### Quick Start with RebootX
1. **Install RebootX mobile app** from App Store/Google Play
2. **Start RebootX On-Prem integration:**
```powershell
.\scripts\start-rebootx-on-prem.ps1
```
3. **Configure RebootX app:**
- Server URL: `http://localhost:9001`
- API Key: `tailscale-mcp-rebootx-key-2024`
- Path Prefix: `tailscale-mcp`
### Available Runnables
- **Tailscale MCP Server** (tailscale-mcp-001)
- **Grafana Dashboard** (grafana-001)
- **Prometheus Metrics** (prometheus-001)
- **Loki Log Aggregation** (loki-001)
### Available Dashboards
- **Tailscale Network Overview**: Active devices, network health, bandwidth usage
- **Monitoring Infrastructure**: Grafana views, Prometheus metrics, Loki logs
- **Security Metrics**: Security scans, failed auth attempts, blocked IPs
- **Performance Metrics**: Network latency, bandwidth usage, CPU/memory usage
For detailed information, see:
- [RebootX On-Prem Documentation](rebootx-on-prem/README.md)
- [RebootX On-Prem Setup Guide](REBOOTX_ON_PREM_SETUP_GUIDE.md) - Complete setup with fixed IP configuration
- [RebootX Quick Reference](REBOOTX_QUICK_REFERENCE.md) - Quick reference for configuration
## š Grafana Dashboards
### Available Dashboards:
- **Tailscale MCP Overview**: Main dashboard with device status, network traffic, API metrics, and log stream
- **Device Management**: Detailed device metrics and management operations
- **Network Monitoring**: Network traffic, connectivity, and performance metrics
- **Security Dashboard**: Security events, authentication, and compliance metrics
### Dashboard Features:
- Real-time device status monitoring
- Network traffic visualization
- API response time tracking
- Error rate monitoring
- Log stream integration with Loki
- Custom alerts and notifications
## š Log Analysis with Loki
Loki provides powerful log aggregation and querying capabilities:
### Log Sources:
- Tailscale MCP server logs
- Docker container logs
- System logs
- Application-specific logs
### Query Examples:
```logql
# All Tailscale MCP logs
{job="tailscale-mcp"}
# Error logs only
{job="tailscale-mcp"} |= "ERROR"
# Logs from specific device operations
{job="tailscale-mcp"} |= "device" |= "authorize"
# Logs with specific time range
{job="tailscale-mcp"}[5m]
```
## š Prometheus Metrics
### Available Metrics:
- `tailscale_devices_total` - Total number of devices
- `tailscale_devices_online_total` - Number of online devices
- `tailscale_network_bytes_sent_total` - Network bytes sent
- `tailscale_network_bytes_received_total` - Network bytes received
- `tailscale_api_request_duration_seconds` - API request duration histogram
- `tailscale_api_requests_total` - Total API requests by status code
### Query Examples:
```promql
# Device status
tailscale_devices_total
# Network throughput
rate(tailscale_network_bytes_sent_total[5m])
# API response times
histogram_quantile(0.95, rate(tailscale_api_request_duration_seconds_bucket[5m]))
# Error rate
rate(tailscale_api_requests_total{status=~"5.."}[5m])
```
## š ļø Configuration
### Prometheus Configuration
- **File**: `monitoring/prometheus/prometheus.yml`
- **Scrape Interval**: 15 seconds
- **Retention**: 200 hours
- **Targets**: MCP server, Grafana, Loki
### Loki Configuration
- **File**: `monitoring/loki/loki.yml`
- **Storage**: Filesystem-based
- **Retention**: Configurable
- **Schema**: v11
### Promtail Configuration
- **File**: `monitoring/promtail/promtail.yml`
- **Log Sources**: MCP server, Docker containers, system logs
- **Pipeline**: Structured log parsing and labeling
## š§ Customization
### Adding New Dashboards:
1. Create JSON dashboard file in `monitoring/grafana/dashboards/`
2. Restart Grafana or use provisioning
3. Access via Grafana UI
### Adding New Metrics:
1. Update MCP server code to expose new metrics
2. Add queries to Prometheus configuration
3. Create visualizations in Grafana
### Adding New Log Sources:
1. Update Promtail configuration
2. Add log parsing rules
3. Create log queries in Grafana
## šØ Alerts and Notifications
### Built-in Alerts:
- High error rate (>5% 5xx errors)
- Device connectivity issues
- API response time degradation
- Disk space warnings
- Memory usage alerts
### Alert Channels:
- Slack integration
- Email notifications
- PagerDuty integration
- Webhook notifications
## š Useful Commands
```powershell
# View all logs
docker-compose logs -f
# View specific service logs
docker-compose logs -f grafana
docker-compose logs -f prometheus
docker-compose logs -f loki
# Restart services
docker-compose restart
# Stop all services
docker-compose down
# Update services
docker-compose pull
docker-compose up -d
```
## š Security Considerations
- Change default Grafana admin password
- Use environment variables for sensitive data
- Consider network isolation for production
- Regular security updates for Docker images
- Monitor access logs and authentication
## š Additional Resources
- [Grafana Documentation](https://grafana.com/docs/)
- [Prometheus Documentation](https://prometheus.io/docs/)
- [Loki Documentation](https://grafana.com/docs/loki/)
- [Docker Compose Documentation](https://docs.docker.com/compose/)