README.md•13.3 kB
# EdgeLake MCP Server
A Model Context Protocol (MCP) server for EdgeLake distributed database, providing AI assistants with access to query and explore distributed data across EdgeLake nodes.
## Features
- **Resource Discovery**: List all databases and tables available on EdgeLake nodes
- **Schema Inspection**: Retrieve table schemas with column information
- **SQL Query Execution**: Execute complex SQL queries with:
- WHERE clauses with AND/OR operators
- GROUP BY aggregations
- ORDER BY with ASC/DESC sorting
- JOINs across tables and databases
- Extended metadata fields (+ip, +hostname, @table_name, etc.)
- LIMIT for result pagination
- **Multi-threaded Execution**: Concurrent request handling for optimal performance
- **Stateless Design**: No session management required
## Architecture
```
┌────────────────────┐ ┌─────────────────────────┐ ┌────────────────────┐
│ │ │ EdgeLake MCP Server │ │ │
│ MCP Client │◀───────▶│ │◀───────▶│ EdgeLake Node │
│ (Claude, etc.) │ stdio │ - Resources (list) │ HTTP │ (REST API) │
│ │ │ - Resources (read) │ │ │
└────────────────────┘ │ - Tools (query) │ └────────────────────┘
│ - Tools (node_status) │
└─────────────────────────┘
```
## Installation
### Prerequisites
- Python 3.10 or higher
- Access to an EdgeLake node with REST API enabled
- EdgeLake node running on accessible IP:port (default: localhost:32049)
### Install Dependencies
```bash
pip install -r requirements.txt
```
## Configuration
Configure the server using environment variables:
**TODO:** Update so that node information can be provided dynamically
| Variable | Description | Default |
|----------|-------------|---------|
| `EDGELAKE_HOST` | EdgeLake node IP/hostname | `127.0.0.1` |
| `EDGELAKE_PORT` | EdgeLake REST API port | `32049` |
| `EDGELAKE_TIMEOUT` | HTTP request timeout (seconds) | `20` |
| `EDGELAKE_MAX_WORKERS` | Max concurrent threads | `10` |
| `LOG_LEVEL` | Logging level (DEBUG, INFO, WARNING, ERROR) | `INFO` |
### Example Configuration
Create a `.env` file:
```bash
EDGELAKE_HOST=192.168.1.106
EDGELAKE_PORT=32049
EDGELAKE_TIMEOUT=30
EDGELAKE_MAX_WORKERS=20
LOG_LEVEL=INFO
```
Or export environment variables:
```bash
export EDGELAKE_HOST=192.168.1.106
export EDGELAKE_PORT=32049
```
## Usage
### Running the Server
The MCP server runs as a subprocess using stdio transport:
```bash
python server.py
```
### MCP Client Configuration
Add to your MCP client configuration (e.g., Claude Desktop):
#### macOS/Linux: `~/Library/Application Support/Claude/claude_desktop_config.json`
#### Windows: `%APPDATA%\Claude\claude_desktop_config.json`
```json
{
"mcpServers": {
"edgelake": {
"command": "python",
"args": ["/path/to/edgelake/mcp-server/server.py"],
"env": {
"EDGELAKE_HOST": "192.168.1.106",
"EDGELAKE_PORT": "32049"
}
}
}
}
```
## MCP Protocol Implementation
### Resources
#### `resources/list`
Lists all available databases and tables.
**Response Format:**
```
database://{database_name} - Database resource
database://{database_name}/{table_name} - Table resource
```
**Example:**
```json
[
{
"uri": "database://my_database",
"name": "Database: my_database",
"description": "All tables in database 'my_database'",
"mimeType": "application/json"
},
{
"uri": "database://my_database/users",
"name": "my_database.users",
"description": "Table 'users' in database 'my_database'",
"mimeType": "application/json"
}
]
```
#### `resources/read`
Reads a specific resource (table schema).
**URI Format:** `database://{database}/{table}`
**Example Request:**
```json
{
"method": "resources/read",
"params": {
"uri": "database://my_database/users"
}
}
```
**Example Response:**
```json
{
"contents": [
{
"uri": "database://my_database/users",
"mimeType": "application/json",
"text": "{\n \"columns\": [\n {\"name\": \"id\", \"type\": \"INTEGER\"},\n {\"name\": \"name\", \"type\": \"VARCHAR\"},\n {\"name\": \"email\", \"type\": \"VARCHAR\"}\n ]\n}"
}
]
}
```
### Tools
#### `query`
Execute SQL queries against EdgeLake with advanced filtering and aggregation.
**Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `database` | string | Yes | Database name |
| `table` | string | Yes | Table name |
| `select` | array[string] | No | Columns to select (default: `["*"]`) |
| `where` | string | No | WHERE clause conditions |
| `group_by` | array[string] | No | Columns to group by |
| `order_by` | array[object] | No | Sort specifications |
| `include_tables` | array[string] | No | Additional tables to JOIN |
| `extend_fields` | array[string] | No | Metadata fields to add |
| `limit` | integer | No | Max rows to return (default: 100) |
| `format` | string | No | Output format: `json` or `table` (default: `json`) |
**Example - Simple Query:**
```json
{
"name": "query",
"arguments": {
"database": "iot_data",
"table": "sensor_readings",
"where": "temperature > 25",
"limit": 10
}
}
```
**Example - Complex Aggregation:**
```json
{
"name": "query",
"arguments": {
"database": "iot_data",
"table": "sensor_readings",
"select": ["device_id", "AVG(temperature) as avg_temp", "COUNT(*) as count"],
"where": "timestamp > '2025-01-01'",
"group_by": ["device_id"],
"order_by": [
{"column": "avg_temp", "direction": "DESC"}
],
"limit": 20
}
}
```
**Example - Cross-Database Join:**
```json
{
"name": "query",
"arguments": {
"database": "sales",
"table": "orders",
"include_tables": ["inventory.products", "customers"],
"where": "orders.status = 'completed'",
"limit": 50
}
}
```
**Example - Extended Fields:**
```json
{
"name": "query",
"arguments": {
"database": "iot_data",
"table": "events",
"extend_fields": ["+ip", "+hostname", "@table_name"],
"limit": 100
}
}
```
#### `node_status`
Get EdgeLake node status and health information.
**Example:**
```json
{
"name": "node_status",
"arguments": {}
}
```
#### `list_databases`
List all available databases in EdgeLake. Use this to discover what databases are available before querying.
**Example:**
```json
{
"name": "list_databases",
"arguments": {}
}
```
**Response:**
```json
{
"databases": ["new_company", "iot_data", "sales"],
"count": 3
}
```
#### `list_tables`
List all tables in a specific database. Use this to discover what tables are available in a database before querying.
**Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `database` | string | Yes | Database name to list tables from |
**Example:**
```json
{
"name": "list_tables",
"arguments": {
"database": "new_company"
}
}
```
**Response:**
```json
{
"database": "new_company",
"tables": ["rand_data", "ping_sensor", "events"],
"count": 3
}
```
#### `get_schema`
Get the schema (column definitions) for a specific table. Use this to understand what columns are available before querying.
**Parameters:**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `database` | string | Yes | Database name |
| `table` | string | Yes | Table name |
**Example:**
```json
{
"name": "get_schema",
"arguments": {
"database": "new_company",
"table": "rand_data"
}
}
```
**Response:**
```json
{
"columns": [
{"name": "row_id", "type": "SERIAL"},
{"name": "insert_timestamp", "type": "TIMESTAMP"},
{"name": "tsd_name", "type": "CHAR(3)"},
{"name": "tsd_id", "type": "INT"},
{"name": "timestamp", "type": "timestamp"},
{"name": "value", "type": "decimal"}
]
}
```
#### `server_info`
Get EdgeLake MCP Server version and configuration information.
**Example:**
```json
{
"name": "server_info",
"arguments": {}
}
```
**Response:**
```json
{
"version": "1.0.6",
"server_name": "edgelake-mcp-server",
"configuration": {
"edgelake_host": "192.168.1.106",
"edgelake_port": 32349,
"request_timeout": 20,
"max_workers": 10,
"log_level": "INFO"
}
}
```
## Query Building Rules
### WHERE Clause
Add filtering conditions with AND/OR operators:
```sql
WHERE is_active = true AND age > 18
WHERE status = 'active' OR status = 'pending'
WHERE (category = 'A' OR category = 'B') AND price > 100
```
### GROUP BY
Group results by columns (required when using aggregations with non-aggregated columns):
```sql
SELECT device_id, AVG(temperature) FROM sensors GROUP BY device_id
```
### ORDER BY
Order results by columns with optional direction:
```sql
ORDER BY created_at DESC
ORDER BY category ASC, price DESC
```
### Include Tables (JOINs)
Include additional tables using comma-separated syntax. For cross-database tables, use `db_name.table_name`:
```sql
FROM orders, customers, inventory.products
```
### Extended Fields
Add EdgeLake metadata fields using special prefixes:
- `+ip` - Node IP address
- `+overlay_ip` - Overlay network IP
- `+hostname` - Node hostname
- `@table_name` - Source table name
```sql
SELECT +ip, +hostname, @table_name, * FROM events
```
### LIMIT
Limit the number of rows returned:
```sql
SELECT * FROM users LIMIT 100
```
## API Examples
### Using curl (for testing)
```bash
# List resources
echo '{"jsonrpc":"2.0","id":1,"method":"resources/list","params":{}}' | python server.py
# Read table schema
echo '{"jsonrpc":"2.0","id":2,"method":"resources/read","params":{"uri":"database://mydb/users"}}' | python server.py
# Execute query
echo '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"query","arguments":{"database":"mydb","table":"users","where":"is_active = true","limit":10}}}' | python server.py
# Get node status
echo '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"node_status","arguments":{}}}' | python server.py
```
## EdgeLake Commands Reference
The MCP server uses EdgeLake's REST API with these commands:
| MCP Operation | EdgeLake Command |
|---------------|------------------|
| List databases | `GET /` with header `command: get databases` |
| List tables | `GET /` with header `command: get tables where dbms = {database}` |
| Get schema | `GET /` with header `command: get columns where dbms = {database} and table = {table}` |
| Execute query | `GET /` with header `command: sql {database} format = {format} "{query}"` |
| Node status | `GET /` with header `command: get status` |
## Logging
Logs are written to:
- **File**: `edgelake_mcp.log` (in server directory)
- **stderr**: Console output for debugging
Set log level via `LOG_LEVEL` environment variable (DEBUG, INFO, WARNING, ERROR).
## Development
### Project Structure
```
mcp-server/
├── server.py # Main MCP server implementation
├── edgelake_client.py # Multi-threaded EdgeLake HTTP client
├── query_builder.py # SQL query construction
├── config.py # Configuration management
├── requirements.txt # Python dependencies
├── README.md # This file
└── Design/ # Design documentation
├── mcp_service.md
└── top-level-diagram.monojson
```
### Running Tests
```bash
pytest
```
### Code Style
```bash
# Format code
black *.py
# Type checking
mypy *.py
```
## Troubleshooting
### Connection Issues
**Problem**: Cannot connect to EdgeLake node
```
Error: Request error: Connection refused
```
**Solution**:
1. Verify EdgeLake node is running: `curl http://{host}:{port}`
2. Check firewall settings
3. Verify `EDGELAKE_HOST` and `EDGELAKE_PORT` are correct
### Empty Database List
**Problem**: No databases returned from `resources/list`
**Solution**:
1. Check EdgeLake node has databases: `curl -H "command: get databases" http://{host}:{port}`
2. Verify user has permissions to view databases
3. Check EdgeLake logs for errors
### Query Timeout
**Problem**: Query takes too long and times out
**Solution**:
1. Increase `EDGELAKE_TIMEOUT` environment variable
2. Add more specific WHERE clauses to reduce result set
3. Use LIMIT to restrict rows returned
## License
Mozilla Public License 2.0
## Support
For issues and questions:
- EdgeLake Documentation: https://edgelake.github.io
- EdgeLake GitHub: https://github.com/EdgeLake
- MCP Specification: https://modelcontextprotocol.io
## Contributing
Contributions are welcome! Please ensure:
1. Code follows PEP 8 style guide
2. All tests pass
3. New features include documentation
4. Type hints are used throughout