Skip to main content
Glama
MCP_RAG_Alignment_Report.md8.79 kB
# MCP Server to RAG Server Alignment Report ## Executive Summary After reviewing the comprehensive API documentation and OpenAPI specification against the MCP server implementation, I've identified **critical mismatches** between what the RAG server actually returns and what the MCP server expects. The validation errors in the logs are caused by **response schema mismatches** where field names and structures don't align. ## Critical Issues Identified ### 1. **query_text** - MAJOR MISMATCH **RAG Server Returns:** ```json { "response": "string" } ``` **MCP Server Expects (QueryResponse model):** ```json { "response": "string", "query": "string|null", // ❌ MISSING - REQUIRED FIELD "results": [...], "total_results": int, "processing_time": float, "context": "string" } ``` **Issue:** The `query` field is marked as required in the MCP model but not returned by RAG server. ### 2. **get_pipeline_status** - MAJOR MISMATCH **RAG Server Returns (PipelineStatusResponse schema):** ```json { "autoscanned": false, "busy": false, "job_name": "Default Job", "job_start": "string", "docs": 0, "batchs": 0, "cur_batch": 0, "request_pending": false, "latest_message": "", "history_messages": ["string"], "update_status": {} } ``` **MCP Server Expects:** ```json { "autoscanned": bool, "busy": bool, "status": "string", // ❌ MISSING - REQUIRED FIELD // ... other fields } ``` **Issue:** MCP model incorrectly expects a `status` field that doesn't exist in RAG response. ### 3. **get_track_status** - MAJOR MISMATCH **RAG Server Returns (TrackStatusResponse schema):** ```json { "track_id": "string", "documents": [...], "total_count": 1, "status_summary": {} } ``` **MCP Server Expects:** ```json { "track_id": "string", "documents": [...], "total_count": int, "status_summary": {}, "status": "string" // ❌ MISSING - REQUIRED FIELD } ``` **Issue:** MCP model incorrectly expects a `status` field. ### 4. **clear_documents** - MAJOR MISMATCH **RAG Server Returns (ClearDocumentsResponse schema):** ```json { "status": "success", "message": "All documents cleared successfully. Deleted 15 files." } ``` **MCP Server Expects:** ```json { "status": "string", "message": "string|null", "cleared": bool, // ❌ MISSING - REQUIRED FIELD "count": int // ❌ MISSING - REQUIRED FIELD } ``` **Issue:** MCP model expects `cleared` and `count` fields that don't exist. ### 5. **clear_cache** - MAJOR MISMATCH **RAG Server Returns (ClearCacheResponse schema):** ```json { "status": "success", "message": "Successfully cleared cache for modes: ['default', 'naive']" } ``` **MCP Server Expects:** ```json { "status": "string", "message": "string", "cache_type": "string|null", "cleared": bool // ❌ MISSING - REQUIRED FIELD } ``` **Issue:** MCP model expects `cleared` field that doesn't exist. ### 6. **get_graph_labels** - POTENTIAL MISMATCH **RAG Server Returns:** `List[str]` (array of strings) **MCP Server Expects (LabelsResponse model):** ```json { "entity_labels": ["string"], "relation_labels": ["string"] } ``` **Issue:** Server returns simple array, MCP expects structured object. Code has workaround but may be fragile. ## Correctly Aligned Endpoints The following endpoints appear to be correctly aligned: 1. **insert_text** - ✅ Aligned with InsertResponse 2. **insert_texts** - ✅ Aligned with InsertResponse 3. **upload_document** - ✅ Aligned with InsertResponse 4. **scan_documents** - ✅ Aligned with ScanResponse 5. **get_documents** - ✅ Aligned with DocsStatusesResponse 6. **get_documents_paginated** - ✅ Aligned with PaginatedDocsResponse 7. **delete_document** - ✅ Aligned with DeleteDocByIdResponse 8. **get_knowledge_graph** - ✅ Aligned (uses nodes/edges structure) 9. **check_entity_exists** - ✅ Aligned with EntityExistsResponse 10. **update_entity** - ✅ Aligned with EntityUpdateResponse 11. **update_relation** - ✅ Aligned with RelationUpdateResponse 12. **delete_entity** - ✅ Aligned with DeletionResult 13. **delete_relation** - ✅ Aligned with DeletionResult 14. **get_document_status_counts** - ✅ Aligned with StatusCountsResponse 15. **get_health** - ✅ Aligned with HealthResponse 16. **query_text_stream** - ✅ Aligned (streaming handled correctly) ## Implementation Plan ### Phase 1: Fix Critical Response Model Mismatches #### 1.1 Fix QueryResponse Model **File:** `src/daniel_lightrag_mcp/models.py` **Change:** Make `query` field optional in QueryResponse ```python class QueryResponse(BaseModel): response: str = Field(..., description="Query response text") query: Optional[str] = None # ❌ Change from required to optional # ... rest unchanged ``` #### 1.2 Fix PipelineStatusResponse Model **File:** `src/daniel_lightrag_mcp/models.py` **Change:** Remove non-existent `status` field, ensure all fields match RAG schema ```python class PipelineStatusResponse(BaseModel): autoscanned: bool = Field(..., description="Whether auto-scanning is enabled") busy: bool = Field(..., description="Whether pipeline is busy") # ❌ REMOVE: status field doesn't exist in RAG response job_name: Optional[str] = None job_start: Optional[str] = None docs: Optional[int] = None batchs: Optional[int] = None cur_batch: Optional[int] = None request_pending: Optional[bool] = None latest_message: Optional[str] = None history_messages: Optional[List[str]] = None update_status: Optional[Dict[str, Any]] = None # Remove fields not in RAG response: # progress, current_task, message ``` #### 1.3 Fix TrackStatusResponse Model **File:** `src/daniel_lightrag_mcp/models.py` **Change:** Remove non-existent `status` field ```python class TrackStatusResponse(BaseModel): track_id: str = Field(..., description="Track ID") documents: List[Dict[str, Any]] = Field(default_factory=list, description="Documents in track") total_count: int = Field(0, description="Total document count") status_summary: Dict[str, Any] = Field(default_factory=dict, description="Status summary") # ❌ REMOVE: status field doesn't exist in RAG response ``` #### 1.4 Fix ClearDocumentsResponse Model **File:** `src/daniel_lightrag_mcp/models.py` **Change:** Remove non-existent fields ```python class ClearDocumentsResponse(BaseModel): status: str = Field(..., description="Clearing status") message: Optional[str] = None # ❌ REMOVE: cleared and count fields don't exist in RAG response ``` #### 1.5 Fix ClearCacheResponse Model **File:** `src/daniel_lightrag_mcp/models.py` **Change:** Remove non-existent `cleared` field ```python class ClearCacheResponse(BaseModel): status: str = Field(..., description="Cache clearing status") message: str = Field(..., description="Status message") cache_type: Optional[str] = None # ❌ REMOVE: cleared field doesn't exist in RAG response # Note: duplicate message field in original - keep one ``` ### Phase 2: Verify Graph Labels Handling #### 2.1 Review get_graph_labels Implementation **File:** `src/daniel_lightrag_mcp/client.py` **Current workaround:** ```python # Server returns a list, but our model expects a dict with labels field if isinstance(response_data, list): response_data = {"labels": response_data} ``` **Action:** Verify this works correctly or update LabelsResponse model to match actual RAG response. ### Phase 3: Testing and Validation #### 3.1 Update Test Expectations - Update comprehensive test to expect correct response formats - Remove assumptions about non-existent fields #### 3.2 Integration Testing - Test each fixed endpoint individually - Verify no regression in working endpoints - Test with actual RAG server responses ## Risk Assessment **High Risk:** - Query functionality is critical - `query_text` fix is highest priority - Pipeline status monitoring may be used by monitoring systems **Medium Risk:** - Cache and document clearing operations - Track status monitoring **Low Risk:** - Graph operations (mostly working) - Document management (mostly working) ## Success Criteria 1. All 22 MCP tools pass validation without Pydantic errors 2. Real-world query operations work correctly 3. No regression in currently working endpoints 4. Comprehensive test passes 100% ## Next Steps 1. **Immediate:** Fix the 5 critical response model mismatches 2. **Verify:** Test fixes against actual RAG server 3. **Validate:** Run comprehensive test suite 4. **Monitor:** Check logs for any remaining validation errors This plan addresses the root cause of the validation errors seen in the logs and ensures proper alignment between MCP server expectations and RAG server responses.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/desimpkins/daniel-lightrag-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server