# MCP Athena Analytics Server
Secure MCP server for querying analytics data through AWS Athena.
## Overview
This MCP (Model Context Protocol) server provides Claude and other LLMs with controlled access to analytics data stored in S3/Athena. It implements multiple security layers to prevent data breaches, destructive operations, and excessive resource usage.
**Use cases:**
- Exploratory data analysis via natural language
- Ad-hoc queries without writing SQL manually
- Template-based reporting with parameter validation
- Data discovery (tables, schemas, available templates)
## Architecture
**HTTP-based deployment (recommended for production):**
```
┌─────────────────────────────────────────────────────────────┐
│ claude-network (Docker bridge) │
│ │
│ ┌────────────────────────┐ ┌─────────────────────────┐ │
│ │ Claude Code Container │ │ MCP Server Container │ │
│ │ (claude-sandbox) │ │ (mcp-athena-server) │ │
│ │ │ │ │ │
│ │ Claude Code CLI │HTTP│ FastAPI/SSE Server │ │
│ │ ↓ │◄───┤ ↓ │ │
│ │ MCP Client (HTTP) │ │ MCP Protocol Handler │ │
│ │ │ │ ↓ │ │
│ │ 🚫 NO credentials │ │ Athena Tools │ │
│ │ │ │ ↓ │ │
│ └────────────────────────┘ │ boto3 + .env │ │
│ │ ↓ │ │
│ │ ✅ AWS Athena │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Security benefits:**
- ✅ Credentials isolated in MCP server container
- ✅ Claude container has NO access to .env or AWS credentials
- ✅ Bearer token authentication between containers
- ✅ HTTP API as security boundary
- ✅ Network-level isolation via Docker bridge
**Alternative: stdio mode (local development only):**
- MCP server runs as subprocess of Claude Code
- ⚠️ Claude has access to all credentials (not recommended)
## Features
### 4 MCP Tools
1. **execute_template** - Run template with validated parameters
2. **execute_query** - Execute ad-hoc SELECT query (validated)
3. **list_tables** - Show available Athena tables
4. **get_table_schema** - Get table DDL (CREATE TABLE statement)
### Security Layers
#### 1. Query Validation
- **Ad-hoc queries:** Only SELECT and WITH (CTE) allowed
- **Forbidden keywords:** DROP, DELETE, CREATE, ALTER, INSERT, TRUNCATE, REPLACE
- **Whitelist approach:** Query must start with SELECT or WITH
#### 2. Template Blacklist
- **DROP_TABLE** - Permanently blocked (destructive)
- **DROP_TABLE_IF_EXISTS** - Permanently blocked (destructive)
- **Extensible:** Add more templates to blacklist.py as needed
#### 3. Resource Limits
- **Max rows:** 100,000 (results truncated if exceeded)
- **Max timeout:** 600 seconds (10 minutes)
- **Scan warning:** 100 GB (logged but not blocked)
#### 4. Data Sanitization
- **Sensitive columns auto-detected:** password, token, secret, key, credential, api_key, etc
- **Redaction:** Values replaced with `***REDACTED***`
- **Applied to all results** before returning to LLM
#### 5. AWS Authentication
- **IRSA (IAM Roles for Service Accounts):** Temporary credentials via Kubernetes
- **No static keys:** Credentials refreshed automatically
- **Default profile:** Sandbox/replica buckets (read-only production data)
## Installation
### Prerequisites
- Docker & Docker Compose
- AWS credentials (~/.aws/credentials with ATHENA_ANALYTICS_INTERNAL profile)
- Active Athena setup
### Setup (HTTP-based, recommended)
```bash
# 1. Create .env file in home directory
cp mcp_athena_analytics/.env.example ~/.mcp_athena_analytics.env
# 2. Generate authentication token
openssl rand -hex 32
# 3. Edit .env file and set MCP_AUTH_TOKEN
nano ~/.mcp_athena_analytics.env
# 4. Start claude-sandbox first (creates network)
cd claude-sandbox
docker compose up -d
# 5. Start MCP server container (joins network)
cd ../mcp_athena_analytics
docker compose up -d --build
# 6. Verify server is running
docker ps | grep mcp-athena-server
curl http://localhost:8000/health
# Expected: {"status":"healthy","service":"mcp-athena-analytics"}
# 7. Configure Claude Code CLI
cd ..
cp .mcp.json.example .mcp.json
# 8. Edit .mcp.json and set MCP_AUTH_TOKEN (same as in ~/.mcp_athena_analytics.env)
nano .mcp.json
```
**Note:** `.env` file is stored in home directory (`~/.mcp_athena_analytics.env`) to keep credentials outside project directory.
### Setup (stdio mode, local development only)
**Note:** stdio mode is deprecated. Use HTTP-based deployment instead.
For local testing without Docker:
```bash
# 1. Install dependencies
pip install -r mcp_athena_analytics/requirements.txt
pip install -r app/requirements.txt
# 2. Create .env file
cp mcp_athena_analytics/.env.example ~/.mcp_athena_analytics.env
nano ~/.mcp_athena_analytics.env
# 3. Test server locally (HTTP mode)
PYTHONPATH=. python mcp_athena_analytics/server_http.py
```
## Usage
### Claude Code CLI (HTTP mode)
MCP server configuration is in `.mcp.json` (created from `.mcp.json.example`):
```json
{
"mcpServers": {
"athena-analytics": {
"type": "sse",
"url": "http://mcp-athena-server:8000/sse",
"headers": {
"Authorization": "Bearer YOUR_TOKEN_HERE"
}
}
}
}
```
**Note:** URL uses container name `mcp-athena-server` for Docker DNS resolution (both containers in `claude-network`).
**Important:** Replace `YOUR_TOKEN_HERE` with the same token from `mcp_athena_analytics/.env`
Start Claude Code and enable MCP server:
```bash
/mcp # In Claude Code CLI
```
**Note:** `.mcp.json` is in `.gitignore` (local config), `.mcp.json.example` is tracked.
### Testing
Use `curl` to test HTTP endpoints:
```bash
# Health check
curl http://localhost:8000/health
# Test with MCP client (from claude-sandbox container)
docker exec -it claude-sandbox bash
curl http://mcp-athena-server:8000/health
```
## Tool Examples
### execute_query
Run ad-hoc SELECT query:
```json
{
"tool": "execute_query",
"arguments": {
"sql_query": "SELECT COUNT(*) FROM provider__actions_alpha WHERE year = 2024"
}
}
```
### execute_template
Run registered template with params:
```json
{
"tool": "execute_template",
"arguments": {
"template_name": "AGG_RTP",
"params": {"year": 2024, "month": 1, "brand": "alpha"}
}
}
```
## Security Best Practices
### What's Allowed
✅ SELECT queries (read-only)
✅ WITH (Common Table Expressions)
✅ Registered templates (except blacklisted)
✅ Table metadata queries
### What's Forbidden
❌ DROP TABLE (data destruction)
❌ DELETE/INSERT/UPDATE (data modification)
❌ CREATE/ALTER (schema modification)
### Data Access
- **Athena (S3):** ✅ Read-only access via MCP server
- **PostgreSQL:** ❌ NOT accessible (by design)
- Reason: Direct DB access bypasses audit trail
- Alternative: Use Athena (data replicated to S3)
## Troubleshooting
### Server won't start
**Error:** `ImportError: No module named 'mcp'`
**Fix:** `pip install mcp`
**Error:** `ImportError: No module named 'app.lib.aws'`
**Fix:** Set PYTHONPATH:
```bash
export PYTHONPATH=/path/to/analytics:$PYTHONPATH
python mcp_athena_analytics/server.py
```
**Error:** `botocore.exceptions.NoCredentialsError`
**Fix:** Configure AWS credentials (~/.aws/credentials profile ATHENA_ANALYTICS_INTERNAL)
### Queries timeout
**Symptoms:** Queries exceed 600s
**Fixes:**
1. Add partition filters (year, month, day)
2. Reduce date range
3. Use aggregated tables instead of raw data
### Results truncated
**Symptoms:** "truncated: X rows → 100,000 rows"
**Fixes:**
1. Add LIMIT clause: `SELECT ... LIMIT 10000`
2. Add WHERE filters to reduce results
3. Use GROUP BY aggregation
### Sensitive data not redacted
**Fix:** Add pattern to `sanitizer.py`:
```python
SENSITIVE_PATTERNS = [
r"\bpassword\b",
r"\buser_api_key\b", # Add your pattern
]
```
## Architecture
### File Structure
```
mcp_athena_analytics/
├── server.py # MCP server entrypoint
├── config.py # Path setup
├── logging_setup.py # Logging configuration
├── athena_tools/
│ ├── registry.py # Tool registry (ToolConfig, TOOL_REGISTRY)
│ ├── execute_query.py # Ad-hoc query tool
│ ├── execute_template.py
│ ├── list_tables.py
│ └── get_table_schema.py
├── blacklist.py # Template blacklist
├── validator.py # Query validation
└── sanitizer.py # Data redaction
```
### Data Flow
```
Claude Code CLI
↓ (JSON-RPC over stdio)
server.py
↓ (setup_paths, load_env)
TOOL_REGISTRY
↓ (route to tool.execute())
athena_tools/*.py
↓ (validate, check blacklist)
AthenaHelper
↓ (boto3)
AWS Athena → S3
↓ (results)
sanitizer.py (redact PII)
↓ (JSON)
Claude Code CLI
```
## Development
### Running Tests
```bash
# Test individual modules
python -m mcp_athena_analytics.blacklist
python -m mcp_athena_analytics.validator
python -m mcp_athena_analytics.sanitizer
```
### Adding New Tools
1. Create tool module in `athena_tools/my_tool.py`:
```python
TOOL_NAME = 'my_tool'
TOOL_DESCRIPTION = 'What this tool does'
TOOL_INPUT_SCHEMA = {...}
def execute(param1: str) -> dict[str, Any]:
"""Implementation."""
return {'result': '...'}
```
2. Add to `athena_tools/registry.py`:
```python
from mcp_athena_analytics.athena_tools import my_tool
TOOL_REGISTRY: list[ToolConfig] = [
# ...
ToolConfig(
name=my_tool.TOOL_NAME,
description=my_tool.TOOL_DESCRIPTION,
input_schema=my_tool.TOOL_INPUT_SCHEMA,
execute=my_tool.execute,
),
]
```
Tool automatically appears in `list_tools()` and `call_tool()`.
### Logging
- **stdout:** Reserved for MCP protocol (don't pollute!)
- **stderr:** Error messages and debugging
- **File logs:** `tmp/mcp_server_YYYYMMDD_HHMMSS.log`
## License
Internal analytics project.
## Support
1. Check troubleshooting section above
2. Review server logs in `tmp/mcp_server_*.log`
3. Test with MCP Inspector
4. Contact analytics team
## References
- [MCP Documentation](https://modelcontextprotocol.io/)
- [MCP Inspector](https://github.com/modelcontextprotocol/inspector)