REFACTORING_GUIDE.mdโข6.43 kB
# ๐ MCP System Refactoring Guide
## Overview
This guide shows how to migrate from the monolithic `llmDatabaseRouter.py` to a clean, layered architecture.
## ๐๏ธ New Architecture
```
๐ mcp_system/
โโโ ๐ presentation/ # MCP Protocol Layer (thin)
โ โโโ server.py # MCP server setup & routing
โ โโโ tools/ # MCP tool implementations
โ โโโ search_tools.py # Question answering tools
โ โโโ sql_tools.py # SQL execution tools
โ โโโ schema_tools.py # Schema inspection tools
โ
โโโ ๐ง services/ # Business Logic Layer
โ โโโ smart_search.py # Main orchestrator
โ โโโ sql_service.py # SQL generation & validation
โ โโโ semantic_service.py # Vector/text search
โ โโโ schema_service.py # Schema introspection
โ โโโ synthesis_service.py # LLM response generation
โ
โโโ ๐พ repositories/ # Data Access Layer
โ โโโ postgres_repository.py # Raw DB operations
โ โโโ vector_repository.py # Embedding operations
โ โโโ llm_repository.py # LLM API operations
โ
โโโ ๐ง shared/ # Shared utilities
โโโ models.py # Data classes/types
โโโ exceptions.py # Custom exceptions
โโโ config.py # Configuration
```
## ๐ Migration Steps
### Phase 1: Move Functions to Repositories
**From llmDatabaseRouter.py โ repositories/postgres_repository.py:**
- `safe_run_sql()` โ `execute_query()`
- `_get_all_table_names()` โ `get_all_table_names()`
- `_validate_table_existence()` โ `validate_tables_exist()`
- `_is_sql_safe_to_run()` โ `_is_sql_safe_to_run()`
**From llmDatabaseRouter.py โ repositories/vector_repository.py:**
- `semantic_rows()` โ `search_embeddings()`
- `_text_search_fallback()` โ `text_search_fallback()`
- `_generate_embedding()` โ Move to separate embedding service
### Phase 2: Extract Services
**Create services/schema_service.py:**
- `get_schema_info()` โ Enhanced version with caching
- `_build_schema_description()` โ `build_catalog_descriptions()`
- `search_catalog()` โ `find_relevant_tables()`
**Create services/sql_service.py:**
- `generate_sql()` โ Enhanced with better context
- `_generate_sql_heuristics()` โ Fallback method
- `_get_suggested_queries()` โ `get_suggested_queries()`
**Create services/semantic_service.py:**
- `_attempt_semantic_search()` โ `search()`
- `_extract_search_terms()` โ Internal method
**Create services/synthesis_service.py:**
- `_generic_synthesis()` โ `synthesize_response()`
- `_build_generic_synthesis_prompt()` โ Internal method
- `_clean_markdown_output()` โ `clean_markdown()`
### Phase 3: Create Smart Search Orchestrator
**Create services/smart_search.py:**
- `answer()` โ Main orchestration method
- Move question classification logic here
- Coordinate between all services
### Phase 4: Thin MCP Layer
**Update server.py:**
- Remove all business logic
- Keep only MCP protocol handling
- Delegate to SmartSearch service
## ๐ง Key Benefits
### โ
Separation of Concerns
- **Repositories**: Pure data access, no business logic
- **Services**: Business logic, no protocol concerns
- **Presentation**: Protocol handling, no data access
### โ
Testability
- Each layer can be unit tested independently
- Mock dependencies easily
- Clear test boundaries
### โ
Maintainability
- Single responsibility principle
- Easy to find and modify specific functionality
- Clear dependency flow
### โ
Scalability
- Services can be scaled independently
- Easy to add new data sources
- Simple to extend with new capabilities
## ๐ Usage Examples
### Before (Monolithic)
```python
# Everything in one place
router = llmDatabaseRouter(engine, llm_client)
response = router.answer("How many users are active?")
```
### After (Clean Architecture)
```python
# Clear separation of concerns
smart_search = SmartSearch(
schema_service, sql_service,
semantic_service, synthesis_service
)
response = smart_search.answer("How many users are active?")
```
## ๐ฆ Dependency Injection
```python
# repositories/
postgres_repo = PostgresRepository(engine)
vector_repo = VectorRepository(engine)
# services/
schema_service = SchemaService(postgres_repo)
sql_service = SQLService(postgres_repo, schema_service)
semantic_service = SemanticService(vector_repo)
synthesis_service = SynthesisService(llm_client)
# orchestrator/
smart_search = SmartSearch(
schema_service, sql_service,
semantic_service, synthesis_service
)
# presentation/
search_tools = SearchTools(smart_search)
```
## ๐งช Testing Strategy
### Repository Tests
```python
def test_postgres_repository():
repo = PostgresRepository(test_engine)
result = repo.execute_query("SELECT 1")
assert result.success
assert result.data == [{'?column?': 1}]
```
### Service Tests
```python
def test_sql_service():
mock_repo = Mock()
mock_schema = Mock()
service = SQLService(mock_repo, mock_schema)
queries = service.get_suggested_queries("Count users")
assert len(queries) > 0
assert "SELECT COUNT(*)" in queries[0].sql
```
### Integration Tests
```python
def test_smart_search_integration():
search = SmartSearch(...)
response = search.answer("How many active users?")
assert response.success
assert "users" in response.answer_markdown.lower()
```
## ๐ Migration Checklist
- [ ] Create new directory structure
- [ ] Move shared models and exceptions
- [ ] Extract PostgresRepository
- [ ] Extract VectorRepository
- [ ] Create SchemaService
- [ ] Create SQLService
- [ ] Create SemanticService
- [ ] Create SynthesisService
- [ ] Build SmartSearch orchestrator
- [ ] Create thin MCP tools
- [ ] Update server.py
- [ ] Add configuration management
- [ ] Write unit tests
- [ ] Write integration tests
- [ ] Update documentation
## ๐ฏ Next Steps
1. **Start with repositories** - Move pure data access first
2. **Extract services one by one** - Maintain working system
3. **Create orchestrator** - Wire everything together
4. **Thin the MCP layer** - Remove business logic
5. **Add comprehensive tests** - Ensure reliability
6. **Monitor and optimize** - Improve performance
This refactoring will make your codebase much more maintainable, testable, and scalable! ๐