README.md•5.52 kB
# Snowflake MCP Agent System
Enhanced MCP Snowflake server with LangGraph agentic architecture for intelligent data analysis and querying.
## Overview
This system provides:
- **MCP Server**: 20+ specialized tools for Snowflake data operations
- **Agentic Client**: LangGraph-powered multi-agent system with few-shot learning
- **Session Management**: State persistence and intelligent caching
- **Training Capabilities**: Continuous improvement through user feedback
## Prerequisites
- Python 3.12+
- Snowflake account with appropriate permissions
- JWT token for authentication (if using corporate endpoints)
## Installation
```bash
pip install -e .
```
## Configuration
Create a `.env` file with your Snowflake credentials:
```bash
# Required
SNOWFLAKE_USER=your_username
SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_DATABASE=your_database
SNOWFLAKE_SCHEMA=your_schema
# Authentication (choose one)
SNOWFLAKE_PASSWORD=your_password
# OR
SNOWFLAKE_PRIVATE_KEY_PATH=path/to/private_key.pem
SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=optional_passphrase
# Optional
SNOWFLAKE_ROLE=your_role
JWT_TOKEN=your_jwt_token
```
## Quick Start
### 1. Start the MCP Server
```bash
# Terminal 1
python -m mcp_code server
```
Server runs on `http://127.0.0.1:8000/mcp`
### 2. Run the Agentic Client
```bash
# Terminal 2
python -m mcp_code --mode interactive
```
## Usage Modes
### Interactive Mode (Default)
```bash
python -m mcp_code
```
Chat interface with multi-agent responses and few-shot learning.
### Single Query Mode
```bash
python -m mcp_code --mode query --query "What are the top 5 most used tables?"
```
### Batch Processing Mode
```bash
python -m mcp_code --mode batch --file queries.txt
```
### Training Mode
```bash
python -m mcp_code --mode train
```
Collects positive feedback examples for agent improvement.
## Agent Archetypes
The system includes specialized agents:
- **Analyst**: EDA, statistical analysis, trend identification
- **Lineage Expert**: Data flow tracing, impact analysis
- **Usage Auditor**: Resource monitoring, anomaly detection
- **Query Optimizer**: Performance analysis, optimization recommendations
- **Metadata Curator**: Schema documentation, data cataloging
## Data Sources
The system analyzes six Snowflake datasets:
1. **AAI_USAGE**: User access patterns and resource consumption
2. **AAI_LINEAGE**: Source-to-target table mappings
3. **AAI_MD**: Table metadata and data product information
4. **AAI_PROFILER**: Column-level statistics and data quality metrics
5. **AAI_ACCESS**: Role-based permissions and access control
6. **AAI_SQL_ANALYZER**: Query execution metadata and performance metrics
## API Reference
### Core Tools
- `list_databases()` - List available databases
- `list_schemas(database)` - List schemas in database
- `list_tables(database, schema)` - List tables in schema
- `run_query(sql)` - Execute SELECT queries
### Analysis Tools
- `analyze_usage(time_period, business_unit)` - Usage pattern analysis
- `get_lineage(table_name, direction, depth)` - Data lineage tracing
- `identify_heavy_users(metric, top_n)` - Resource consumption analysis
- `analyze_slow_queries(threshold_seconds)` - Performance bottleneck identification
- `get_table_metadata(table_name)` - Comprehensive metadata retrieval
- `recommend_data_products(analysis_scope)` - Data product recommendations
### Session Management
- `save_feedback(session_id, query, response, feedback_type)` - Training data collection
- `get_session_history(session_id)` - Query history and statistics
## Architecture
```
┌─────────────────┐ HTTP/MCP ┌──────────────────┐
│ Agentic Client │ ◄───────────► │ MCP Server │
│ (LangGraph) │ │ (FastMCP) │
└─────────────────┘ └──────────────────┘
│ │
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Few-Shot │ │ Snowflake │
│ Training Store │ │ Database │
└─────────────────┘ └──────────────────┘
```
## Error Handling
Common issues and solutions:
- **Connection Failed**: Verify Snowflake credentials in `.env`
- **JWT Token Invalid**: Update `JWT_TOKEN` in environment
- **Import Errors**: Run `pip install -e .` to install dependencies
- **Port 8000 Busy**: Server already running or port in use
## Development
### Project Structure
```
mcp_code/
├── __init__.py # Main entry point and CLI
├── server.py # Enhanced MCP server with tools
├── client_refactored.py # LangGraph agentic client
├── db_client.py # Snowflake database client
├── config.py # Configuration management
├── query_utils.py # Query analysis utilities
└── training_examples.json # Few-shot training data
```
### Adding New Agents
1. Create agent class inheriting from `BaseAgent`
2. Define `_get_base_prompt()` method
3. Add corresponding MCP tools in `server.py`
4. Update routing logic in `client_refactored.py`