MySQL MCP Server

README.md•24.8 KiB

# MySQL MCP Server A Model Context Protocol (MCP) server for MySQL databases. This server provides a unified interface for exploring and querying MySQL databases through MCP. ## Features - **MySQL Support**: Connect to MySQL databases - **Unified Interface**: Consistent tools and API for MySQL operations - **Database-Specific Optimizations**: Uses MySQL-optimized SQL syntax - **Schema Exploration**: List databases, tables, and relationships - **Query Execution**: Run SQL queries with proper parameter handling - **Resource Support**: MCP resource endpoints for table data - **LangGraph Text-to-SQL Agent**: Intelligent agent that converts natural language to SQL queries with automatic schema exploration and error recovery ## Installation 1. Install dependencies: ```bash pip install -r requirements.txt ``` The requirements include: - MySQL connector for database access - MCP and FastMCP for the server - LangChain and LangGraph for the text-to-SQL agent - LangChain OpenAI integration for LLM support ## Usage ### Connection Strings #### MySQL ```bash # Using connection string python mysql-db-server.py --conn "mysql://user:password@host:port/database" # Using environment variable export DATABASE_CONNECTION_STRING="mysql://user:password@host:port/database" python mysql-db-server.py ``` ### Command Line Options ```bash python mysql-db-server.py [OPTIONS] Options: --conn TEXT MySQL connection string (format: mysql://user:password@host:port/database) --transport TEXT Transport protocol: stdio, sse, or streamable-http (default: stdio) --host TEXT Host to bind for SSE/HTTP transports (default: 127.0.0.1) --port INTEGER Port to bind for SSE/HTTP transports (default: 8000) --mount TEXT Optional mount path for SSE transport (e.g., /mcp) --readonly Enable read-only mode (prevents INSERT, UPDATE, DELETE, etc.) --help Show this message and exit ``` ### Environment Variables - `DATABASE_CONNECTION_STRING`: MySQL connection string - `DATABASE_READONLY`: Set to "true", "1", or "yes" to enable read-only mode - `DATABASE_STATEMENT_TIMEOUT_MS`: Query timeout in milliseconds - `MCP_TRANSPORT`: Transport protocol (stdio, sse, streamable-http) - `MCP_HOST`: Host for network transports - `MCP_PORT`: Port for network transports - `MCP_SSE_MOUNT`: Mount path for SSE transport ### Read-Only Mode Read-only mode prevents any write operations (INSERT, UPDATE, DELETE, DROP, etc.) and only allows SELECT, SHOW, WITH, VALUES, and EXPLAIN queries. **Enable via command-line:** ```bash python mysql-db-server.py --conn "mysql://user:password@host:port/database" --readonly ``` **Enable via environment variable:** ```bash export DATABASE_READONLY="true" python mysql-db-server.py --conn "mysql://user:password@host:port/database" ``` **Check if read-only mode is enabled:** ```python # Get tools from MCP client tools = await client.get_tools() server_info_tool = next((t for t in tools if t.name == "server_info"), None) if server_info_tool: result = await server_info_tool.ainvoke({}) print(result["readonly"]) # True if read-only mode is enabled ``` ## LangGraph Text-to-SQL Agent The project includes a sophisticated LangGraph-based agent that converts natural language questions into SQL queries. The agent automatically explores the database schema, generates SQL queries, executes them, and refines queries if errors occur. ### Features - **Intelligent Schema Exploration**: - Automatically identifies relevant tables using LLM analysis (skips LLM call for ≤3 tables) - Only explores tables needed for the query (much faster!) - Describes table structures with exact column names - Fetches foreign key relationships for better JOINs - Caches schema information across queries (session-based) - **Query Validation**: - Validates that user queries are actually database questions before generating SQL - Uses fast heuristics (set-based keyword matching) first, then LLM only for ambiguous cases - Rejects gibberish, greetings, and non-database questions early - Optimized to avoid LLM calls for 90%+ of cases - **Intelligent SQL Generation**: - Uses LLM with chain-of-thought reasoning for step-by-step query generation - Handles multi-part questions (e.g., "find X and then find Y") with subqueries - **Window Function Support**: Automatically uses window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER, etc.) for appropriate scenarios like "top N per group", rankings within groups, comparing rows, running totals, percentiles, moving averages - **Tie Handling**: Correctly handles ties in "most/least" queries by returning ALL entities with max/min value - Validates SQL syntax and auto-fixes simple issues - **Confidence Scoring & Auto-Refinement**: - Calculates confidence scores based on actual query execution results - **Explicitly checks if SQL answers the question** (not just syntax correctness) - Automatically refines queries when confidence is low or errors detected - Provides detailed analysis and reasoning for every query - **Error Recovery**: - Intelligently parses SQL errors to extract actionable information - Automatically retries with improved queries when execution fails - Preserves query structure when fixing simple errors (e.g., column names) - **Performance Optimized**: - Parallel execution of schema exploration operations - Compiled regex patterns for faster text processing - Reuses test query results when possible (avoids redundant executions) - State-based SQL storage (avoids re-extraction) - Set-based lookups for O(1) table/column checks - LRU cache management for schema and column caches (prevents unbounded growth) - Timeout handling for LLM and database calls (prevents hanging) - **Robust Error Handling**: - Input validation (rejects None, empty, or invalid queries) - Timeout protection for all LLM and database calls (validates timeout > 0) - Graceful fallbacks when LLM returns empty/malformed responses - Type safety for all conversions (float, int) with try/except - Regex group validation (checks groups are not empty before use) - Cache structure validation (handles corrupted cache gracefully) - Safe string parsing (handles malformed table resources, error messages) - Tool call structure validation (validates dict structure before processing) - Comprehensive logging for debugging (optional, can be disabled) - **MCP Integration**: Seamlessly uses MCP tools for database operations ### Setup 1. **Start the MCP Server** (in one terminal): ```bash python mysql-db-server.py --conn "mysql://user:password@host:port/database" --transport streamable-http --port 8000 ``` 2. **Use the Agent** (in Python/Jupyter): ```python from langchain_mcp_adapters.client import MultiServerMCPClient from langchain_openai import ChatOpenAI from text_to_sql_agent import TextToSQLAgent # Connect to MCP server client = MultiServerMCPClient({ "mysql-server": { "url": "http://localhost:8000/mcp", "transport": "streamable_http" } }) # Initialize LLM llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) # Create agent with optional configuration agent = TextToSQLAgent( mcp_client=client, llm=llm, max_query_attempts=3, # Maximum retry attempts llm_timeout=60, # Timeout for LLM calls (seconds) query_timeout=30, # Timeout for database queries (seconds) max_schema_cache_size=1000, # Maximum table descriptions to cache max_column_cache_size=500, # Maximum column name extractions to cache enable_logging=True # Enable logging for debugging ) llm = ChatOpenAI( api_key="your-openai-api-key", model="gpt-4o-mini", temperature=0 ) # Create the agent agent = TextToSQLAgent( mcp_client=client, llm=llm, max_query_attempts=3 # Maximum retry attempts ) ``` ### Usage Examples #### Basic Query ```python # Ask a natural language question result = await agent.query("How many authors are in the database?") # Get the final answer answer = agent.get_final_answer(result) print(answer) ``` #### Query with Filtering ```python # Complex queries with filters result = await agent.query("Show me all authors born after 1950") print(agent.get_final_answer(result)) ``` #### Aggregation Queries ```python # Statistical queries result = await agent.query("What is the average birth year of all authors?") print(agent.get_final_answer(result)) ``` ### How It Works The agent follows an optimized workflow: 1. **Schema Exploration** (with caching and parallelization): - Lists all available tables (cached after first call) - Intelligently identifies relevant tables using LLM (skips for ≤3 tables) - Describes table structures for relevant tables only (parallel execution) - Fetches foreign key relationships for better JOINs (parallel execution) - Caches all schema information for subsequent queries 2. **Query Validation** (before SQL generation): - Validates that the user query is a valid database question - Uses fast heuristics (keyword matching) first - Falls back to LLM validation only for ambiguous cases - Rejects invalid queries early to avoid unnecessary processing 3. **SQL Generation** (with confidence scoring): - Uses LLM with chain-of-thought reasoning for step-by-step generation - Includes schema, foreign keys, and column names in prompt - Executes test query (LIMIT 3) to get sample results - Calculates confidence score and analysis (single LLM call) - **Confidence scoring explicitly checks if SQL answers the question** - Validates SQL syntax and detects critical issues - Sets refinement flags for edge routing (does not refine directly) - Stores final SQL in state to avoid re-extraction 4. **SQL Refinement** (if needed): - Separate node handles refinement when confidence is low or errors detected - Fetches missing schema if needed - Refines SQL using analysis and error context - Re-executes test query and recalculates confidence 5. **Query Execution** (optimized to avoid redundancy): - Uses stored SQL from state (avoids re-extraction) - Reuses test query results if they contain all data (avoids redundant execution) - **Respects original LIMIT clause when reusing test results** (e.g., LIMIT 1 returns 1 row, not all test results) - Executes full query only when needed 5. **Error Recovery** (with intelligent parsing): - Intelligently parses SQL errors (extracts error type, column, table) - Passes error context to SQL generation (optional refinement) - Preserves query structure when fixing simple errors - Automatically retries failed queries (up to `max_query_attempts`) ### Agent State Graph The agent uses a LangGraph state machine with the following nodes: - **explore_schema**: Discovers and caches database schema (with conditional routing for completion) - **generate_sql**: Converts natural language to SQL using LLM, calculates confidence, sets refinement flags - **refine_sql**: Refines SQL based on confidence score and analysis (separate node for clarity) - **execute_query**: Runs SQL queries via MCP tools (reuses test results when possible) - **refine_query**: Improves queries based on error feedback (routes back to generate_sql) - **tools**: Handles tool calls for schema exploration **Architecture**: The agent follows LangGraph best practices with clear separation: - **Nodes**: Process state and return updates (no conditional logic) - **Edges**: Make routing decisions based on state flags (all orchestration logic) For a detailed explanation of the architecture, workflow graph, and how to extend the agent, see **[AGENT_ARCHITECTURE.md](AGENT_ARCHITECTURE.md)**. ### Advanced Usage #### Custom Configuration ```python # Run with custom LangGraph config result = await agent.query( "Find all authors with more than 5 books", config={"recursion_limit": 50} ) ``` #### Access Full State ```python # Get complete agent state including all messages result = await agent.query("Show me the database schema") # Access messages, schema info, and query attempts messages = result["messages"] schema_info = result["schema_info"] attempts = result["query_attempts"] ``` #### Helper Function for Clean Results ```python from langchain_core.messages import ToolMessage def get_answer(result): """Extract the final answer from agent result""" messages = result.get("messages", []) for msg in reversed(messages): if isinstance(msg, ToolMessage) and "successfully" in msg.content.lower(): return msg.content return result.get("messages", [])[-1].content if result.get("messages") else "No answer" # Use it result = await agent.query("How many tables are in the database?") print(get_answer(result)) ``` ### Tips for Best Results 1. **Be Specific**: Clear, specific questions work best - ✅ "Show me all authors born after 1950" - ❌ "authors stuff" 2. **Use Table Names**: If you know table names, mention them - ✅ "List all books in the authors table" - ✅ "How many records are in the authors table?" 3. **Specify Filters**: Be explicit about filtering criteria - ✅ "Find authors where birth_year is greater than 1950" - ✅ "Show me authors with names starting with 'G'" 4. **Ask for Aggregations**: The agent handles COUNT, SUM, AVG, etc. - ✅ "What is the average birth year?" - ✅ "Count the total number of authors" ### Limitations - Currently optimized for SELECT queries (read-only operations) - Maximum retry attempts are configurable (default: 3) - Requires OpenAI API key for LLM functionality - Schema information is cached across queries within the same agent instance for better performance ### Improving the Agent For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see **[IMPROVEMENTS.md](IMPROVEMENTS.md)**. **✅ Implemented Features:** - **Intelligent Table Selection**: Uses LLM to identify only relevant tables (skips for ≤3 tables) - **Foreign Key Relationships**: Fetches FK info for better JOIN understanding - **Schema Caching**: Caches schema information across queries (session-based) - **Chain-of-Thought Reasoning**: Step-by-step query generation for better accuracy - **Confidence Scoring**: Calculates confidence based on actual query execution results - **Auto-Refinement**: Automatically improves queries when issues detected - **Intelligent Error Parsing**: Extracts actionable information from error messages - **Performance Optimizations**: Compiled regex, parallel execution, result reuse, state-based storage **Planned Improvements:** - **Few-Shot Examples**: Add example queries to guide better SQL generation patterns - **Query Explanation**: Explain what generated SQL queries do - **Query History and Learning**: Learn from past successful queries See **[IMPROVEMENTS.md](IMPROVEMENTS.md)** for detailed implementation examples and step-by-step instructions. ## MySQL Features - Database-level organization - `SHOW DATABASES` and `SHOW TABLES` for performance - `DESCRIBE` for table structure - `SHOW CREATE TABLE` for detailed table information ## Tools The server provides the following MCP tools: - `server_info`: Get server and database information (database type: MySQL, readonly status, MySQL connector version) - `db_identity`: Get current database identity details (database type: MySQL, database name, user, host, port, server version) - `run_query`: Execute SQL queries with typed input (returns markdown or JSON string) - `run_query_json`: Execute SQL queries with typed input (returns JSON list) - `list_table_resources`: List tables as MCP resource URIs (`table://schema/table`) - returns structured list - `read_table_resource`: Read table **data** (rows) via MCP resource protocol - returns JSON list - `list_tables`: List tables in a database - returns **markdown** string (human-readable) - `describe_table`: Get table **structure** (columns, types, constraints) - returns markdown string - `get_foreign_keys`: Get foreign key relationships (via SHOW CREATE TABLE) ## Calling MCP Tools `MultiServerMCPClient` doesn't have a `call_tool()` method. Instead, you need to: 1. Get tools using `get_tools()` which returns LangChain `StructuredTool` objects 2. Find the tool by name 3. Invoke it using `tool.ainvoke()` with the appropriate arguments ### Efficient Tool Lookup For better performance (especially with many tools), create a dictionary for O(1) lookups: ```python # Get all tools tools = await client.get_tools() # Create dictionary for fast O(1) lookup (recommended) tool_dict = {t.name: t for t in tools} # Now you can call tools efficiently server_info = await tool_dict["server_info"].ainvoke({}) tables = await tool_dict["list_tables"].ainvoke({"db_schema": None}) ``` ### Get Server Information ```python # Get tools tools = await client.get_tools() tool_dict = {t.name: t for t in tools} # Get server info and connection details result = await tool_dict["server_info"].ainvoke({}) # Returns: { # "name": "MySQL Database Explorer", # "database_type": "MySQL", # Database type is explicitly included # "readonly": False, # "mysql_connector_version": "8.0.33" # } db_info = await tool_dict["db_identity"].ainvoke({}) # Returns: { # "database_type": "MySQL", # Database type is explicitly included # "database": "mydatabase", # "user": "root@localhost", # "host": "localhost", # "port": 3306, # "server_version": "8.0.33" # } ``` ### List Tables ```python tools = await client.get_tools() tool_dict = {t.name: t for t in tools} # Lists tables in the current database result = await tool_dict["list_tables"].ainvoke({"db_schema": "mydatabase"}) # Or get tables as MCP resources resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"}) ``` ### Explore Table Structure ```python tools = await client.get_tools() tool_dict = {t.name: t for t in tools} # Get table structure result = await tool_dict["describe_table"].ainvoke({ "table_name": "users", "db_schema": "mydatabase" }) # Get foreign key relationships fks = await tool_dict["get_foreign_keys"].ainvoke({ "table_name": "users", "db_schema": "mydatabase" }) ``` ### Execute Query ```python tools = await client.get_tools() tool_dict = {t.name: t for t in tools} # Execute query with markdown output result = await tool_dict["run_query"].ainvoke({ "input": { "sql": "SELECT * FROM users LIMIT 10", "format": "markdown" } }) # Execute query with JSON output result = await tool_dict["run_query_json"].ainvoke({ "input": { "sql": "SELECT * FROM users LIMIT 10", "row_limit": 100 } }) ``` ### MCP Resource Tools These tools implement the MCP (Model Context Protocol) resource pattern, which allows tables to be treated as discoverable resources that can be accessed via standardized URIs. #### `list_table_resources` Lists all tables in a schema and returns them as MCP resource URIs. This enables MCP clients to discover available tables dynamically. **What it does:** - Queries the database to get all table names from the specified schema - Formats each table as a resource URI: `table://schema/table_name` - Returns a list of these URIs **Example:** ```python # Get all tables as resource URIs tools = await client.get_tools() tool_dict = {t.name: t for t in tools} resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydatabase"}) # Returns: ["table://mydatabase/users", "table://mydatabase/orders", "table://mydatabase/products"] ``` **Use cases:** - Dynamic table discovery in MCP clients - Building UI that lists available tables - Integration with MCP resource-aware tools #### `read_table_resource` Reads data from a specific table using the MCP resource protocol. This provides a standardized way to access table data. **What it does:** - Executes `SELECT * FROM schema.table` with a row limit - Returns table rows as a list of dictionaries (JSON format) - Each dictionary represents one row with column names as keys **Example:** ```python # Read table data via MCP resource protocol tools = await client.get_tools() tool_dict = {t.name: t for t in tools} data = await tool_dict["read_table_resource"].ainvoke({ "schema": "mydatabase", "table": "users", "row_limit": 50 # Limits number of rows returned }) # Returns: [ # {"id": 1, "name": "Alice", "email": "alice@example.com"}, # {"id": 2, "name": "Bob", "email": "bob@example.com"}, # ... # ] ``` **Use cases:** - Quick table previews without writing SQL - MCP resource-aware clients that can fetch table data by URI - Data exploration and inspection tools **Key differences from `run_query`:** - `read_table_resource`: Simple, standardized way to read entire tables (no SQL needed) - `run_query`: Flexible, allows any SQL query with custom WHERE clauses, JOINs, etc. ### Comparison: Resource Tools vs Regular Tools #### `list_table_resources` vs `list_tables` Both list tables, but serve different purposes: | Feature | `list_table_resources` | `list_tables` | |---------|------------------------|---------------| | **Return Type** | `List[str]` (structured data) | `str` (markdown text) | | **Format** | Resource URIs: `["table://schema/users", ...]` | Human-readable markdown table | | **Use Case** | Programmatic access, MCP resource protocol | Human viewing, documentation | | **Integration** | Works with MCP resource-aware clients | General purpose, readable output | **Example comparison:** ```python # Get tools once tools = await client.get_tools() tool_dict = {t.name: t for t in tools} # list_table_resources - structured for programs resources = await tool_dict["list_table_resources"].ainvoke({"schema": "mydb"}) # Returns: ["table://mydb/users", "table://mydb/orders"] # list_tables - formatted for humans tables = await tool_dict["list_tables"].ainvoke({"db_schema": "mydb"}) # Returns: "| Tables_in_mydb |\n|-----------------|\n| users |\n| orders |" ``` #### `read_table_resource` vs `describe_table` These serve **completely different purposes** - they're complementary, not redundant: | Feature | `read_table_resource` | `describe_table` | |---------|----------------------|------------------| | **What it returns** | Table **data** (rows) | Table **structure** (schema) | | **Return Type** | `List[Dict[str, Any]]` (JSON) | `str` (markdown) | | **SQL Command** | `SELECT * FROM table` | `DESCRIBE table` | | **Use Case** | View actual data/rows | View column definitions, types, constraints | | **Example Output** | `[{"id": 1, "name": "Alice"}, ...]` | Column names, types, nullability, keys | **Example comparison:** ```python # Get tools once tools = await client.get_tools() tool_dict = {t.name: t for t in tools} # read_table_resource - get the DATA data = await tool_dict["read_table_resource"].ainvoke({ "schema": "mydb", "table": "users", "row_limit": 10 }) # Returns: [{"id": 1, "name": "Alice", "email": "alice@example.com"}, ...] # describe_table - get the STRUCTURE structure = await tool_dict["describe_table"].ainvoke({ "table_name": "users", "db_schema": "mydb" }) # Returns: "| Field | Type | Null | Key | Default | Extra |\n|-------|------|------|-----|---------|-------|\n| id | int | NO | PRI | NULL | auto_increment |\n| name | varchar(100) | NO | | NULL | |" ``` **Summary:** - Use `read_table_resource` when you want to **see the data** in a table - Use `describe_table` when you want to **see the schema/structure** of a table - They answer different questions: "What's in the table?" vs "How is the table defined?" ## Notes - MySQL connection strings must start with `mysql://` - The format is: `mysql://user:password@host:port/database` - All database names are treated as schemas for compatibility ## Documentation - **[AGENT_ARCHITECTURE.md](AGENT_ARCHITECTURE.md)**: Detailed architecture explanation, workflow graph, and extension guide - How the agent works internally - Visual workflow diagrams - Step-by-step guides for adding nodes and tools - Code examples and patterns - **[IMPROVEMENTS.md](IMPROVEMENTS.md)**: Comprehensive improvement guide - Enhancement ideas with priority rankings - Implementation examples - Performance optimizations - Advanced features ## Improving the Agent For a comprehensive guide on enhancing the agent's performance, accuracy, and features, see **[IMPROVEMENTS.md](IMPROVEMENTS.md)**. The improvements guide includes: - **Priority rankings** (High/Medium/Low) for each improvement - **Detailed code examples** with implementation snippets - **Step-by-step integration instructions** - **Workflow graphs** showing how the agent evolves with improvements - **Testing strategies** and validation approaches - **Benefits analysis** for each enhancement **Quick Start**: Begin with high-priority items like foreign key relationships, better error parsing, and column name suggestions for immediate impact. ## Troubleshooting ### Connection Issues - Ensure the MySQL server is running and accessible - Check connection string format and credentials - Verify network connectivity and firewall settings ### MySQL-Specific Issues - Ensure MySQL server supports the connection protocol - Check user permissions for the specified database - Verify MySQL connector version compatibility

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/judyfang0108/MCP-SQL'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•24.8 KiB