# MySQL MCP Server
A Model Context Protocol (MCP) server for MySQL databases. This server provides a unified interface for exploring and querying MySQL databases through MCP.
## Features
- **MySQL Support**: Connect to MySQL databases
- **Unified Interface**: Consistent tools and API for MySQL operations
- **Database-Specific Optimizations**: Uses MySQL-optimized SQL syntax
- **Schema Exploration**: List databases, tables, and relationships
- **Query Execution**: Run SQL queries with proper parameter handling
- **Resource Support**: MCP resource endpoints for table data
- **LangGraph Text-to-SQL Agent**: Intelligent agent that converts natural language to SQL queries with automatic schema exploration and error recovery
## Installation
1. Install dependencies:
```bash
pip install -r requirements.txt
```
The requirements include:
- MySQL connector for database access
- MCP and FastMCP for the server
- LangChain and LangGraph for the text-to-SQL agent
- LangChain OpenAI integration for LLM support
## Usage
### Connection Strings
#### MySQL
```bash
# Using connection string
python mysql-db-server.py --conn "mysql://user:password@host:port/database"
# Using environment variable
export DATABASE_CONNECTION_STRING="mysql://user:password@host:port/database"
python mysql-db-server.py
```
### Command Line Options
```bash
python mysql-db-server.py [OPTIONS]
Options:
--conn TEXT MySQL connection string (format: mysql://user:password@host:port/database)
--transport TEXT Transport protocol: stdio, sse, or streamable-http (default: stdio)
--host TEXT Host to bind for SSE/HTTP transports (default: 127.0.0.1)
--port INTEGER Port to bind for SSE/HTTP transports (default: 8000)
--mount TEXT Optional mount path for SSE transport (e.g., /mcp)
--readonly Enable read-only mode (prevents INSERT, UPDATE, DELETE, etc.)
--help Show this message and exit
```
### Environment Variables
- `DATABASE_CONNECTION_STRING`: MySQL connection string
- `DATABASE_READONLY`: Set to "true", "1", or "yes" to enable read-only mode
- `DATABASE_STATEMENT_TIMEOUT_MS`: Query timeout in milliseconds
- `MCP_TRANSPORT`: Transport protocol (stdio, sse, streamable-http)
- `MCP_HOST`: Host for network transports
- `MCP_PORT`: Port for network transports
- `MCP_SSE_MOUNT`: Mount path for SSE transport
### Read-Only Mode
Read-only mode prevents any write operations (INSERT, UPDATE, DELETE, DROP, etc.) and only allows SELECT, SHOW, WITH, VALUES, and EXPLAIN queries.
**Enable via command-line:**
```bash
python mysql-db-server.py --conn "mysql://user:password@host:port/database" --readonly
```
**Enable via environment variable:**
```bash
export DATABASE_READONLY="true"
python mysql-db-server.py --conn "mysql://user:password@host:port/database"
```
**Check if read-only mode is enabled:**
```python
# Get tools from MCP client
tools = await client.get_tools()
server_info_tool = next((t for t in tools if t.name == "server_info"), None)
if server_info_tool:
result = await server_info_tool.ainvoke({})
print(result["readonly"]) # True if read-only mode is enabled
```
## LangGraph Text-to-SQL Agent
The project includes a sophisticated LangGraph-based agent that converts natural language questions into SQL queries. The agent automatically explores the database schema, generates SQL queries, executes them, and refines queries if errors occur.
### Features
- **Intelligent Schema Exploration**:
- Automatically identifies relevant tables using LLM analysis (skips LLM call for ≤3 tables)
- Only explores tables needed for the query (much faster!)
- Describes table structures with exact column names
- Fetches foreign key relationships for better JOINs
- Caches schema information across queries (session-based)
- **Query Validation**:
- Validates that user queries are actually database questions before generating SQL
- Uses fast heuristics (set-based keyword matching) first, then LLM only for ambiguous cases
- Rejects gibberish, greetings, and non-database questions early
- Optimized to avoid LLM calls for 90%+ of cases
- **Intelligent SQL Generation**:
- Uses LLM with chain-of-thought reasoning for step-by-step query generation
- Handles multi-part questions (e.g., "find X and then find Y") with subqueries
- **Window Function Support**: Automatically uses window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER, etc.) for appropriate scenarios like "top N per group", rankings within groups, comparing rows, running totals, percentiles, moving averages
- **Tie Handling**: Correctly handles ties in "most/least" queries by returning ALL entities with max/min value
- Validates SQL syntax and auto-fixes simple issues
- **Confidence Scoring & Auto-Refinement**:
- Calculates confidence scores based on actual query execution results
- **Explicitly checks if SQL answers the question** (not just syntax correctness)
- Automatically refines queries when confidence is low or errors detected
- Provides detailed analysis and reasoning for every query
- **Error Recovery**:
- Intelligently parses SQL errors to extract actionable information
- Automatically retries with improved queries when execution fails
- Preserves query structure when fixing simple errors (e.g., column names)
- **Performance Optimized**:
- Parallel execution of schema exploration operations
- Compiled regex patterns for faster text processing
- Reuses test query results when possible (avoids redundant executions)
- State-based SQL storage (avoids re-extraction)
- Set-based lookups for O(1) table/column checks
- LRU cache management for schema and column caches (prevents unbounded growth)
- Timeout handling for LLM and database calls (prevents hanging)
- **Robust Error Handling**:
- Input validation (rejects None, empty, or invalid queries)
- Timeout protection for all LLM and database calls (validates timeout > 0)
- Graceful fallbacks when LLM returns empty/malformed responses
- Type safety for all conversions (float, int) with try/except
- Regex group validation (checks groups are not empty before use)
- Cache structure validation (handles corrupted cache gracefully)
- Safe string parsing (handles malformed table resources, error messages)
- Tool call structure validation (validates dict structure before processing)
- Comprehensive logging for debugging (optional, can be disabled)
- **MCP Integration**: Seamlessly uses MCP tools for database operations
### Setup
1. **Start the MCP Server** (in one terminal):
```bash
python mysql-db-server.py --conn "mysql://user:password@host:port/database" --transport streamable-http --port 8000
```
2. **Use the Agent** (in Python/Jupyter):
```python
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_openai import ChatOpenAI
from text_to_sql_agent import TextToSQLAgent
# Connect to MCP server
client = MultiServerMCPClient({
"mysql-server": {
"url": "http://localhost:8000/mcp",
"transport": "streamable_http"
}
})
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Create agent with optional configuration
agent = TextToSQLAgent(
mcp_client=client,
llm=llm,
max_query_attempts=3, # Maximum retry attempts
llm_timeout=60, # Timeout for LLM calls (seconds)
query_timeout=30, # Timeout for database queries (seconds)
max_schema_cache_size=1000, # Maximum table descriptions to cache
max_column_cache_size=500, # Maximum column name extractions to cache
enable_logging=True # Enable logging for debugging
)
llm = ChatOpenAI(
api_key="your-openai-api-key",
model="gpt-4o-mini",
temperature=0
)
# Create the agent
agent = TextToSQLAgent(
mcp_client=client,
llm=llm,
max_query_attempts=3 # Maximum retry attempts
)
```
### Usage Examples
#### Basic Query
```python
# Ask a natural language question
result = await agent.query("How many authors are in the database?")
# Get the final answer
answer = agent.get_final_answer(result)
print(answer)
```
#### Query with Filtering
```python
# Complex queries with filters
result = await agent.query("Show me all authors born after 1950")
print(agent.get_final_answer(result))
```
#### Aggregation Queries
```python
# Statistical queries
result = await agent.query("What is the average birth year of all authors?")
print(agent.get_final_answer(result))
```
### How It Works
The agent follows an optimized workflow:
1. **Schema Exploration** (with caching and parallelization):
- Lists all available tables (cached after first call)
- Intelligently identifies relevant tables using LLM (skips for ≤3 tables)
- Describes table structures for relevant tables only (parallel execution)
- Fetches foreign key relationships for better JOINs (parallel execution)
- Caches all schema information for subsequent queries
2. **Query Validation** (before SQL generation):
- Validates that the user query is a valid database question
- Uses fast heuristics (keyword matching) first
- Falls back to LLM validation only for ambiguous cases
- Rejects invalid queries early to avoid unnecessary processing
3. **SQL Generation** (with confidence scoring):
- Uses LLM with chain-of-thought reasoning for step-by-step generation
- Includes schema, foreign keys, and column names in prompt
- Executes test query (LIMIT 3) to get sample results
- Calculates confidence score and analysis (single LLM call)
- **Confidence scoring explicitly checks if SQL answers the question**
- Validates SQL syntax and detects critical issues
- Sets refinement flags for edge routing (does not refine directly)
- Stores final SQL in state to avoid re-extraction
4. **SQL Refinement** (if needed):
- Separate node handles refinement when confidence is low or errors detected
- Fetches missing schema if needed
- Refines SQL using analysis and error context
- Re-executes test query and recalculates confidence
5. **Query Execution** (optimized to avoid redundancy):
- Uses stored SQL from state (avoids re-extraction)
- Reuses test query results if they contain all data (avoids redundant execution)
- **Respects original LIMIT clause when reusing test results** (e.g., LIMIT 1 returns 1 row, not all test results)
- Executes full query only when needed
5. **Error Recovery** (with intelligent parsing):
- Intelligently parses SQL errors (extracts error type, column, table)
- Passes error context to SQL generation (optional refinement)
- Preserves query structure when fixing simple errors
- Automatically retries failed queries (up to `max_query_attempts`)
### Agent State Graph
The agent uses a LangGraph state machine with the following nodes:
- **explore_schema**: Discovers and caches database schema (with conditional routing for completion)
- **generate_sql**: Converts natural language to SQL using LLM, calculates confidence, sets refinement flags
- **refine_sql**: Refines SQL based on confidence score and analysis (separate node for clarity)
- **execute_query**: Runs SQL queries via MCP tools (reuses test results when possible)
- **refine_query**: Improves queries based on error feedback (routes back to generate_sql)
- **tools**: Handles tool calls for schema exploration
**Architecture**: The agent follows LangGraph best practices with clear separation:
- **Nodes**: Process state and return updates (no conditional logic)
- **Edges**: Make routing decisions based on state flags (all orchestration logic)
For a detailed explanation of the architecture, workflow graph, and how to extend the agent, see **[AGENT_ARCHITECTURE.md](AGENT_ARCHITECTURE.md)**.
### Advanced Usage
#### Custom Configuration
```python
# Run with custom LangGraph config
result = await agent.query(
"Find all authors with more than 5 books",
config={"recursion_limit": 50}
)
```
#### Access Full State
```python
# Get complete agent state including all messages
result = await agent.query("Show me the database schema")
# Access messages, schema info, and query attempts
messages = result["messages"]
schema_info = result["schema_info"]
attempts = result["query_attempts"]
```
#### Helper Function for Clean Results
```python
from langchain_core.messages import ToolMessage
def get_answer(result):
"""Extract the final answer from agent result"""
messages = result.get("messages", [])
for msg in reversed(messages):
if isinstance(msg, ToolMessage) and "successfully" in msg.content.lower():
return msg.content
return result.get("messages", [])[-1].content if result.get("messages") else "No answer"
# Use it
result = await agent.query("How many tables are in the database?")
print(get_answer(result))
```
### Tips for Best Results
1. **Be Specific**: Clear, specific questions work best
- ✅ "Show me all authors born after 1950"
- ❌ "authors stuff"
2. **Use Table Names**: If you know table names, mention them
- ✅ "List all books in the authors table"
- ✅ "How many records are in the authors table?"
3. **Specify Filters**: Be explicit about filtering criteria
- ✅ "Find authors where birth_year is greater than 1950"
- ✅ "Show me authors with names starting with 'G'"
4. **Ask for Aggregations**: The agent handles COUNT, SUM, AVG, etc.
- ✅ "What is the average birth year?"
- ✅ "Count the total number of authors"
### Limitations
- Currently optimized for SELECT queries (read-only operations)
- Maximum retry attempts are configurable (default: 3)
- Requires OpenAI API key for LLM functionality
- Schema information is cached across queries within the same agent instance for better performance
### Improving the Agent
For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see **[IMPROVEMENTS.md](IMPROVEMENTS.md)**.
**✅ Implemented Features:**
- **Intelligent Table Selection**: Uses LLM to identify only relevant tables (skips for ≤3 tables)
- **Foreign Key Relationships**: Fetches FK info for better JOIN understanding
- **Schema Caching**: Caches schema information across queries (session-based)
- **Chain-of-Thought Reasoning**: Step-by-step query generation for better accuracy
- **Confidence Scoring**: Calculates confidence based on actual query execution results
- **Auto-Refinement**: Automatically improves queries when issues detected
- **Intelligent Error Parsing**: Extracts actionable information from error messages
- **Performance Optimizations**: Compiled regex, parallel execution, result reuse, state-based storage
**Planned Improvements:**
- **Few-Shot Examples**: Add example queries to guide better SQL generation patterns
- **Query Explanation**: Explain what generated SQL queries do
- **Query History and Learning**: Learn from past successful queries
See **[IMPROVEMENTS.md](IMPROVEMENTS.md)** for detailed implementation examples and step-by-step instructions.
## MySQL Features
- Database-level organization
- `SHOW DATABASES` and `SHOW TABLES` for performance
- `DESCRIBE` for table structure
- `SHOW CREATE TABLE` for detailed table information
## Tools
The server provides the following MCP tools:
- `server_info`: Get server and database information (database type: MySQL, readonly status, MySQL connector version)
- `db_identity`: Get current database identity details (database type: MySQL, database name, user, host, port, server version)
- `run_query`: Execute SQL queries with typed input (returns markdown or JSON string)
- `run_query_json`: Execute SQL queries with typed input (returns JSON list)
- `list_table_resources`: List tables as MCP resource URIs (`table://schema/table`) - returns structured list
- `read_table_resource`: Read table **data** (rows) via MCP resource protocol - returns JSON list
- `list_tables`: List tables in a database - returns **markdown** string (human-readable)
- `describe_table`: Get table **structure** (columns, types, constraints) - returns markdown string
- `get_foreign_keys`: Get foreign key relationships (via SHOW CREATE TABLE)
## Calling MCP Tools
`MultiServerMCPClient` doesn't have a `call_tool()` method. Instead, you need to:
1. Get tools using `get_tools()` which returns LangChain `StructuredTool` objects
2. Find the tool by name
3. Invoke it using `tool.ainvoke()` with the appropriate arguments
### Efficient Tool Lookup
For better performance (especially with many tools), create a dictionary for O(1) lookups:
```python
# Get all tools
tools = await client.get_tools()
# Create dictionary for fast O(1) lookup (recommended)
tool_dict = {t.name: t for t in tools}
# Now you can call tools efficiently
server_info = await tool_dict["server_info"].ainvoke({})
tables = await tool_dict["list_tables"].ainvoke({"db_schema": None})
```
### Get Server Information
```python
# Get tools
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Get server info and connection details
result = await tool_dict["server_info"].ainvoke({})
# Returns: {
# "name": "MySQL Database Explorer",
# "database_type": "MySQL", # Database type is explicitly included
# "readonly": False,
# "mysql_connector_version": "8.0.33"
# }
db_info = await tool_dict["db_identity"].ainvoke({})
# Returns: {
# "database_type": "MySQL", # Database type is explicitly included
# "database": "mydatabase",
# "user": "root@localhost",
# "host": "localhost",
# "port": 3306,
# "server_version": "8.0.33"
# }
```
### List Tables
```python
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Lists tables in the current database
result = await tool_dict["list_tables"].ainvoke({"db_schema": "mydatabase"})
# Or get tables as MCP resources
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})
```
### Explore Table Structure
```python
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Get table structure
result = await tool_dict["describe_table"].ainvoke({
"table_name": "users",
"db_schema": "mydatabase"
})
# Get foreign key relationships
fks = await tool_dict["get_foreign_keys"].ainvoke({
"table_name": "users",
"db_schema": "mydatabase"
})
```
### Execute Query
```python
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# Execute query with markdown output
result = await tool_dict["run_query"].ainvoke({
"input": {
"sql": "SELECT * FROM users LIMIT 10",
"format": "markdown"
}
})
# Execute query with JSON output
result = await tool_dict["run_query_json"].ainvoke({
"input": {
"sql": "SELECT * FROM users LIMIT 10",
"row_limit": 100
}
})
```
### MCP Resource Tools
These tools implement the MCP (Model Context Protocol) resource pattern, which allows tables to be treated as discoverable resources that can be accessed via standardized URIs.
#### `list_table_resources`
Lists all tables in a schema and returns them as MCP resource URIs. This enables MCP clients to discover available tables dynamically.
**What it does:**
- Queries the database to get all table names from the specified schema
- Formats each table as a resource URI: `table://schema/table_name`
- Returns a list of these URIs
**Example:**
```python
# Get all tables as resource URIs
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"})
# Returns: ["table://mydatabase/users", "table://mydatabase/orders", "table://mydatabase/products"]
```
**Use cases:**
- Dynamic table discovery in MCP clients
- Building UI that lists available tables
- Integration with MCP resource-aware tools
#### `read_table_resource`
Reads data from a specific table using the MCP resource protocol. This provides a standardized way to access table data.
**What it does:**
- Executes `SELECT * FROM schema.table` with a row limit
- Returns table rows as a list of dictionaries (JSON format)
- Each dictionary represents one row with column names as keys
**Example:**
```python
# Read table data via MCP resource protocol
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
data = await tool_dict["read_table_resource"].ainvoke({
"schema": "mydatabase",
"table": "users",
"row_limit": 50 # Limits number of rows returned
})
# Returns: [
# {"id": 1, "name": "Alice", "email": "alice@example.com"},
# {"id": 2, "name": "Bob", "email": "bob@example.com"},
# ...
# ]
```
**Use cases:**
- Quick table previews without writing SQL
- MCP resource-aware clients that can fetch table data by URI
- Data exploration and inspection tools
**Key differences from `run_query`:**
- `read_table_resource`: Simple, standardized way to read entire tables (no SQL needed)
- `run_query`: Flexible, allows any SQL query with custom WHERE clauses, JOINs, etc.
### Comparison: Resource Tools vs Regular Tools
#### `list_table_resources` vs `list_tables`
Both list tables, but serve different purposes:
| Feature | `list_table_resources` | `list_tables` |
|---------|------------------------|---------------|
| **Return Type** | `List[str]` (structured data) | `str` (markdown text) |
| **Format** | Resource URIs: `["table://schema/users", ...]` | Human-readable markdown table |
| **Use Case** | Programmatic access, MCP resource protocol | Human viewing, documentation |
| **Integration** | Works with MCP resource-aware clients | General purpose, readable output |
**Example comparison:**
```python
# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# list_table_resources - structured for programs
resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydb"})
# Returns: ["table://mydb/users", "table://mydb/orders"]
# list_tables - formatted for humans
tables = await tool_dict["list_tables"].ainvoke({"db_schema": "mydb"})
# Returns: "| Tables_in_mydb |\n|-----------------|\n| users |\n| orders |"
```
#### `read_table_resource` vs `describe_table`
These serve **completely different purposes** - they're complementary, not redundant:
| Feature | `read_table_resource` | `describe_table` |
|---------|----------------------|------------------|
| **What it returns** | Table **data** (rows) | Table **structure** (schema) |
| **Return Type** | `List[Dict[str, Any]]` (JSON) | `str` (markdown) |
| **SQL Command** | `SELECT * FROM table` | `DESCRIBE table` |
| **Use Case** | View actual data/rows | View column definitions, types, constraints |
| **Example Output** | `[{"id": 1, "name": "Alice"}, ...]` | Column names, types, nullability, keys |
**Example comparison:**
```python
# Get tools once
tools = await client.get_tools()
tool_dict = {t.name: t for t in tools}
# read_table_resource - get the DATA
data = await tool_dict["read_table_resource"].ainvoke({
"schema": "mydb", "table": "users", "row_limit": 10
})
# Returns: [{"id": 1, "name": "Alice", "email": "alice@example.com"}, ...]
# describe_table - get the STRUCTURE
structure = await tool_dict["describe_table"].ainvoke({
"table_name": "users", "db_schema": "mydb"
})
# Returns: "| Field | Type | Null | Key | Default | Extra |\n|-------|------|------|-----|---------|-------|\n| id | int | NO | PRI | NULL | auto_increment |\n| name | varchar(100) | NO | | NULL | |"
```
**Summary:**
- Use `read_table_resource` when you want to **see the data** in a table
- Use `describe_table` when you want to **see the schema/structure** of a table
- They answer different questions: "What's in the table?" vs "How is the table defined?"
## Notes
- MySQL connection strings must start with `mysql://`
- The format is: `mysql://user:password@host:port/database`
- All database names are treated as schemas for compatibility
## Documentation
- **[AGENT_ARCHITECTURE.md](AGENT_ARCHITECTURE.md)**: Detailed architecture explanation, workflow graph, and extension guide
- How the agent works internally
- Visual workflow diagrams
- Step-by-step guides for adding nodes and tools
- Code examples and patterns
- **[IMPROVEMENTS.md](IMPROVEMENTS.md)**: Comprehensive improvement guide
- Enhancement ideas with priority rankings
- Implementation examples
- Performance optimizations
- Advanced features
## Improving the Agent
For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see **[IMPROVEMENTS.md](IMPROVEMENTS.md)**.
The improvements guide includes:
- **Priority rankings** (High/Medium/Low) for each improvement
- **Detailed code examples** with implementation snippets
- **Step-by-step integration instructions**
- **Workflow graphs** showing how the agent evolves with improvements
- **Testing strategies** and validation approaches
- **Benefits analysis** for each enhancement
**Quick Start**: Begin with high-priority items like foreign key relationships, better error parsing, and column name suggestions for immediate impact.
## Troubleshooting
### Connection Issues
- Ensure the MySQL server is running and accessible
- Check connection string format and credentials
- Verify network connectivity and firewall settings
### MySQL-Specific Issues
- Ensure MySQL server supports the connection protocol
- Check user permissions for the specified database
- Verify MySQL connector version compatibility