MCP Athena Analytics Server

README.md•10.9 KiB

# MCP Athena Analytics Server Secure MCP server for querying analytics data through AWS Athena. ## Overview This MCP (Model Context Protocol) server provides Claude and other LLMs with controlled access to analytics data stored in S3/Athena. It implements multiple security layers to prevent data breaches, destructive operations, and excessive resource usage. **Use cases:** - Exploratory data analysis via natural language - Ad-hoc queries without writing SQL manually - Template-based reporting with parameter validation - Data discovery (tables, schemas, available templates) ## Architecture **HTTP-based deployment (recommended for production):** ``` ┌─────────────────────────────────────────────────────────────┐ │ claude-network (Docker bridge) │ │ │ │ ┌────────────────────────┐ ┌─────────────────────────┐ │ │ │ Claude Code Container │ │ MCP Server Container │ │ │ │ (claude-sandbox) │ │ (mcp-athena-server) │ │ │ │ │ │ │ │ │ │ Claude Code CLI │HTTP│ FastAPI/SSE Server │ │ │ │ ↓ │◄───┤ ↓ │ │ │ │ MCP Client (HTTP) │ │ MCP Protocol Handler │ │ │ │ │ │ ↓ │ │ │ │ 🚫 NO credentials │ │ Athena Tools │ │ │ │ │ │ ↓ │ │ │ └────────────────────────┘ │ boto3 + .env │ │ │ │ ↓ │ │ │ │ ✅ AWS Athena │ │ │ └─────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` **Security benefits:** - ✅ Credentials isolated in MCP server container - ✅ Claude container has NO access to .env or AWS credentials - ✅ Bearer token authentication between containers - ✅ HTTP API as security boundary - ✅ Network-level isolation via Docker bridge **Alternative: stdio mode (local development only):** - MCP server runs as subprocess of Claude Code - ⚠️ Claude has access to all credentials (not recommended) ## Features ### 4 MCP Tools 1. **execute_template** - Run template with validated parameters 2. **execute_query** - Execute ad-hoc SELECT query (validated) 3. **list_tables** - Show available Athena tables 4. **get_table_schema** - Get table DDL (CREATE TABLE statement) ### Security Layers #### 1. Query Validation - **Ad-hoc queries:** Only SELECT and WITH (CTE) allowed - **Forbidden keywords:** DROP, DELETE, CREATE, ALTER, INSERT, TRUNCATE, REPLACE - **Whitelist approach:** Query must start with SELECT or WITH #### 2. Template Blacklist - **DROP_TABLE** - Permanently blocked (destructive) - **DROP_TABLE_IF_EXISTS** - Permanently blocked (destructive) - **Extensible:** Add more templates to blacklist.py as needed #### 3. Resource Limits - **Max rows:** 100,000 (results truncated if exceeded) - **Max timeout:** 600 seconds (10 minutes) - **Scan warning:** 100 GB (logged but not blocked) #### 4. Data Sanitization - **Sensitive columns auto-detected:** password, token, secret, key, credential, api_key, etc - **Redaction:** Values replaced with `***REDACTED***` - **Applied to all results** before returning to LLM #### 5. AWS Authentication - **IRSA (IAM Roles for Service Accounts):** Temporary credentials via Kubernetes - **No static keys:** Credentials refreshed automatically - **Default profile:** Sandbox/replica buckets (read-only production data) ## Installation ### Prerequisites - Docker & Docker Compose - AWS credentials (~/.aws/credentials with ATHENA_ANALYTICS_INTERNAL profile) - Active Athena setup ### Setup (HTTP-based, recommended) ```bash # 1. Create .env file in home directory cp mcp_athena_analytics/.env.example ~/.mcp_athena_analytics.env # 2. Generate authentication token openssl rand -hex 32 # 3. Edit .env file and set MCP_AUTH_TOKEN nano ~/.mcp_athena_analytics.env # 4. Start claude-sandbox first (creates network) cd claude-sandbox docker compose up -d # 5. Start MCP server container (joins network) cd ../mcp_athena_analytics docker compose up -d --build # 6. Verify server is running docker ps | grep mcp-athena-server curl http://localhost:8000/health # Expected: {"status":"healthy","service":"mcp-athena-analytics"} # 7. Configure Claude Code CLI cd .. cp .mcp.json.example .mcp.json # 8. Edit .mcp.json and set MCP_AUTH_TOKEN (same as in ~/.mcp_athena_analytics.env) nano .mcp.json ``` **Note:** `.env` file is stored in home directory (`~/.mcp_athena_analytics.env`) to keep credentials outside project directory. ### Setup (stdio mode, local development only) **Note:** stdio mode is deprecated. Use HTTP-based deployment instead. For local testing without Docker: ```bash # 1. Install dependencies pip install -r mcp_athena_analytics/requirements.txt pip install -r app/requirements.txt # 2. Create .env file cp mcp_athena_analytics/.env.example ~/.mcp_athena_analytics.env nano ~/.mcp_athena_analytics.env # 3. Test server locally (HTTP mode) PYTHONPATH=. python mcp_athena_analytics/server_http.py ``` ## Usage ### Claude Code CLI (HTTP mode) MCP server configuration is in `.mcp.json` (created from `.mcp.json.example`): ```json { "mcpServers": { "athena-analytics": { "type": "sse", "url": "http://mcp-athena-server:8000/sse", "headers": { "Authorization": "Bearer YOUR_TOKEN_HERE" } } } } ``` **Note:** URL uses container name `mcp-athena-server` for Docker DNS resolution (both containers in `claude-network`). **Important:** Replace `YOUR_TOKEN_HERE` with the same token from `mcp_athena_analytics/.env` Start Claude Code and enable MCP server: ```bash /mcp # In Claude Code CLI ``` **Note:** `.mcp.json` is in `.gitignore` (local config), `.mcp.json.example` is tracked. ### Testing Use `curl` to test HTTP endpoints: ```bash # Health check curl http://localhost:8000/health # Test with MCP client (from claude-sandbox container) docker exec -it claude-sandbox bash curl http://mcp-athena-server:8000/health ``` ## Tool Examples ### execute_query Run ad-hoc SELECT query: ```json { "tool": "execute_query", "arguments": { "sql_query": "SELECT COUNT(*) FROM provider__actions_alpha WHERE year = 2024" } } ``` ### execute_template Run registered template with params: ```json { "tool": "execute_template", "arguments": { "template_name": "AGG_RTP", "params": {"year": 2024, "month": 1, "brand": "alpha"} } } ``` ## Security Best Practices ### What's Allowed ✅ SELECT queries (read-only) ✅ WITH (Common Table Expressions) ✅ Registered templates (except blacklisted) ✅ Table metadata queries ### What's Forbidden ❌ DROP TABLE (data destruction) ❌ DELETE/INSERT/UPDATE (data modification) ❌ CREATE/ALTER (schema modification) ### Data Access - **Athena (S3):** ✅ Read-only access via MCP server - **PostgreSQL:** ❌ NOT accessible (by design) - Reason: Direct DB access bypasses audit trail - Alternative: Use Athena (data replicated to S3) ## Troubleshooting ### Server won't start **Error:** `ImportError: No module named 'mcp'` **Fix:** `pip install mcp` **Error:** `ImportError: No module named 'app.lib.aws'` **Fix:** Set PYTHONPATH: ```bash export PYTHONPATH=/path/to/analytics:$PYTHONPATH python mcp_athena_analytics/server.py ``` **Error:** `botocore.exceptions.NoCredentialsError` **Fix:** Configure AWS credentials (~/.aws/credentials profile ATHENA_ANALYTICS_INTERNAL) ### Queries timeout **Symptoms:** Queries exceed 600s **Fixes:** 1. Add partition filters (year, month, day) 2. Reduce date range 3. Use aggregated tables instead of raw data ### Results truncated **Symptoms:** "truncated: X rows → 100,000 rows" **Fixes:** 1. Add LIMIT clause: `SELECT ... LIMIT 10000` 2. Add WHERE filters to reduce results 3. Use GROUP BY aggregation ### Sensitive data not redacted **Fix:** Add pattern to `sanitizer.py`: ```python SENSITIVE_PATTERNS = [ r"\bpassword\b", r"\buser_api_key\b", # Add your pattern ] ``` ## Architecture ### File Structure ``` mcp_athena_analytics/ ├── server.py # MCP server entrypoint ├── config.py # Path setup ├── logging_setup.py # Logging configuration ├── athena_tools/ │ ├── registry.py # Tool registry (ToolConfig, TOOL_REGISTRY) │ ├── execute_query.py # Ad-hoc query tool │ ├── execute_template.py │ ├── list_tables.py │ └── get_table_schema.py ├── blacklist.py # Template blacklist ├── validator.py # Query validation └── sanitizer.py # Data redaction ``` ### Data Flow ``` Claude Code CLI ↓ (JSON-RPC over stdio) server.py ↓ (setup_paths, load_env) TOOL_REGISTRY ↓ (route to tool.execute()) athena_tools/*.py ↓ (validate, check blacklist) AthenaHelper ↓ (boto3) AWS Athena → S3 ↓ (results) sanitizer.py (redact PII) ↓ (JSON) Claude Code CLI ``` ## Development ### Running Tests ```bash # Test individual modules python -m mcp_athena_analytics.blacklist python -m mcp_athena_analytics.validator python -m mcp_athena_analytics.sanitizer ``` ### Adding New Tools 1. Create tool module in `athena_tools/my_tool.py`: ```python TOOL_NAME = 'my_tool' TOOL_DESCRIPTION = 'What this tool does' TOOL_INPUT_SCHEMA = {...} def execute(param1: str) -> dict[str, Any]: """Implementation.""" return {'result': '...'} ``` 2. Add to `athena_tools/registry.py`: ```python from mcp_athena_analytics.athena_tools import my_tool TOOL_REGISTRY: list[ToolConfig] = [ # ... ToolConfig( name=my_tool.TOOL_NAME, description=my_tool.TOOL_DESCRIPTION, input_schema=my_tool.TOOL_INPUT_SCHEMA, execute=my_tool.execute, ), ] ``` Tool automatically appears in `list_tools()` and `call_tool()`. ### Logging - **stdout:** Reserved for MCP protocol (don't pollute!) - **stderr:** Error messages and debugging - **File logs:** `tmp/mcp_server_YYYYMMDD_HHMMSS.log` ## License Internal analytics project. ## Support 1. Check troubleshooting section above 2. Review server logs in `tmp/mcp_server_*.log` 3. Test with MCP Inspector 4. Contact analytics team ## References - [MCP Documentation](https://modelcontextprotocol.io/) - [MCP Inspector](https://github.com/modelcontextprotocol/inspector)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elena-kuznetsova-wh/warehouse-athena-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•10.9 KiB