# Text-to-SQL Agent Improvements
This document outlines improvements that can be made to enhance the text-to-SQL agent's performance, accuracy, and user experience.
> **Note**: For implemented features and detailed architecture, see [AGENT_ARCHITECTURE.md](AGENT_ARCHITECTURE.md) which documents all features, helper methods, and code references.
## Planned Improvements
The following improvements are planned for implementation:
1. **Few-Shot Examples** - Example queries to guide LLM
2. **Query Explanation** - Explain what SQL queries do
3. **Query History and Learning** - Learn from past queries
---
## 1. Few-Shot Examples ⭐ **PLANNED**
Add example queries to guide the LLM.
**Improvement:**
```python
system_prompt += """
EXAMPLES OF GOOD QUERIES:
Question: "How many authors are there?"
SQL: SELECT COUNT(*) FROM `authors`;
Question: "Show authors with most books"
SQL: SELECT `authors`.*, COUNT(`books`.`book_id`) as book_count
FROM `authors`
JOIN `books` ON `authors`.`author_id` = `books`.`author_id`
GROUP BY `authors`.`author_id`
ORDER BY book_count DESC
LIMIT 1;
Question: "Find authors born after 1950"
SQL: SELECT * FROM `authors` WHERE `birth_year` > 1950;
"""
```
**Benefits:**
- Better query patterns
- Consistent formatting
- Fewer syntax errors
**Integration Steps:**
In `_generate_sql_node()` (after the "IMPORTANT RULES" section), add examples to the system prompt:
```python
# After the "IMPORTANT RULES" section, add:
system_prompt += """
=== EXAMPLES OF GOOD QUERIES ===
Question: "How many authors are there?"
SQL: SELECT COUNT(*) FROM `authors`;
Question: "Show authors with most books"
SQL: SELECT `authors`.*, COUNT(`books`.`book_id`) as book_count
FROM `authors`
JOIN `books` ON `authors`.`author_id` = `books`.`author_id`
GROUP BY `authors`.`author_id`
ORDER BY book_count DESC
LIMIT 1;
Question: "Find authors born after 1950"
SQL: SELECT * FROM `authors` WHERE `birth_year` > 1950;
Question: "List all books by a specific author"
SQL: SELECT `books`.* FROM `books`
JOIN `authors` ON `books`.`author_id` = `authors`.`author_id`
WHERE `authors`.`name` = 'John Doe';
"""
```
**Tip:** Customize examples to match your database schema.
**No MCP server changes needed.**
---
## 2. Query Explanation ⭐ **PLANNED**
Explain what the generated SQL does.
**Improvement:**
```python
async def _explain_query(self, sql: str) -> str:
"""Generate explanation of what the SQL query does"""
explain_prompt = f"""Explain what this SQL query does in simple terms:
{sql}
Explanation:"""
response = await self.llm.ainvoke([HumanMessage(content=explain_prompt)])
return response.content
```
**Benefits:**
- User education
- Transparency
- Debugging aid
**Integration Steps:**
1. Add a new node method:
```python
async def _explain_query_node(self, state: AgentState) -> dict:
"""Explain what the generated SQL does"""
messages = state["messages"]
# Extract SQL from AI message
sql = None
for msg in reversed(messages):
if isinstance(msg, AIMessage):
# SQL is usually in the content
content = msg.content
# Try to extract SQL (might be in code blocks)
import re
sql_match = re.search(r'```sql\n(.*?)\n```', content, re.DOTALL)
if sql_match:
sql = sql_match.group(1).strip()
elif "SELECT" in content.upper():
sql = content.strip()
break
if not sql:
return {}
# Generate explanation
explain_prompt = f"""Explain what this SQL query does in simple, non-technical terms:
{sql}
Explanation (2-3 sentences):"""
response = await self.llm.ainvoke([HumanMessage(content=explain_prompt)])
return {
"messages": [AIMessage(content=f"Query Explanation:\n{response.content}")]
}
```
2. Add to graph:
```python
workflow.add_node("explain_query", self._explain_query_node)
# Add conditional edge from generate_sql (optional - explain before execution)
# OR add after successful execution
workflow.add_conditional_edges(
"generate_sql",
self._should_explain,
{
"explain": "explain_query",
"execute": "execute_query" # Continue to execution
}
)
workflow.add_edge("explain_query", "execute_query")
```
3. Add decision function (optional):
```python
def _should_explain(self, state: AgentState) -> Literal["explain", "execute"]:
"""Decide if we should explain the query first"""
# You can make this configurable or always skip
return "execute" # Skip explanation for now, or return "explain" to enable
```
**No MCP server changes needed.**
---
## 3. Query History and Learning ⭐ **PLANNED**
Remember successful query patterns.
**Improvement:**
```python
class TextToSQLAgent:
def __init__(self, ...):
# ... existing code ...
self.query_history = [] # Store successful queries
def _add_to_history(self, question: str, sql: str, success: bool):
"""Store query in history for learning"""
self.query_history.append({
"question": question,
"sql": sql,
"success": success,
"timestamp": time.time()
})
def _get_similar_queries(self, current_question: str) -> list:
"""Find similar past queries to use as examples"""
# Use embedding similarity or simple keyword matching
return [q for q in self.query_history[-10:] if q["success"]]
```
**Benefits:**
- Learning from past queries
- Better patterns over time
- Faster query generation
**Integration Steps:**
1. Add query history to `__init__()`:
```python
def __init__(self, ...):
# ... existing code ...
self.query_history = [] # Store successful queries
```
2. Add method to store queries:
```python
def _add_to_history(self, question: str, sql: str, success: bool):
"""Store query in history for learning"""
import time
self.query_history.append({
"question": question,
"sql": sql,
"success": success,
"timestamp": time.time()
})
```
3. Add method to retrieve similar queries:
```python
def _get_similar_queries(self, current_question: str) -> list:
"""Find similar past queries to use as examples"""
# Simple keyword matching - can be enhanced with embeddings
current_lower = current_question.lower()
similar = []
for q in self.query_history[-20:]: # Check last 20 queries
if q["success"]:
# Simple keyword overlap check
question_lower = q["question"].lower()
common_words = set(current_lower.split()) & set(question_lower.split())
if len(common_words) >= 2: # At least 2 common words
similar.append(q)
return similar[:5] # Return top 5 similar
```
4. Use in `_generate_sql_node()` to include similar queries as examples:
```python
# Get similar past queries
similar_queries = self._get_similar_queries(user_query)
if similar_queries:
system_prompt += "\n=== SIMILAR PAST QUERIES ===\n"
for q in similar_queries:
system_prompt += f"\nQuestion: {q['question']}\nSQL: {q['sql']}\n"
```
5. Store queries after execution in `_execute_query_node()`:
```python
# After successful execution
if "successfully" in result.lower():
# Extract SQL and question
sql = # ... extract from messages
question = # ... extract from messages
self._add_to_history(question, sql, True)
else:
self._add_to_history(question, sql, False)
```
**No MCP server changes needed.**
---
## Integration Tips
1. **Test incrementally**: Add one improvement at a time and test before moving to the next.
2. **Follow existing patterns**: Look at how `_explore_schema_node()` and `_generate_sql_node()` are structured.
3. **Error handling**: Wrap new code in try/except blocks where appropriate.
4. **State management**: Only update `AgentState` fields that are defined in the TypedDict.
5. **Parallel execution**: Use `asyncio.gather()` for multiple async calls (like foreign keys).
**Important:** All improvements can be added to the agent code - **no MCP server changes needed**. The MCP server already provides all the necessary tools (`get_foreign_keys`, `run_query`, etc.).
---
## Next Steps
### Remaining Improvements to Implement
1. **Few-Shot Examples** - Quick win, improves query quality
- Add example queries to system prompt
- Helps with query patterns and formatting
2. **Query Explanation** - Adds transparency
- Add explain_query node to graph
- Generate explanations for generated SQL queries
3. **Query History and Learning** - Enables learning
- Store successful queries in history
- Use similar past queries as examples
- Improves over time
### Implementation Tips
1. **Test incrementally** - Add one improvement at a time
2. **Measure impact** - Track success rates before/after
3. **Follow integration steps** - Each section has detailed integration instructions
4. **No MCP server changes needed** - All improvements are in agent code only