# Phase 9: MCP Endpoints Multi-Folder Support
**Status**: 📋 PLANNED
**Priority**: HIGH
**Start Date**: TBD
**Approach**: Endpoint-by-Endpoint Implementation
**🚀 Implementation Plan**: [Phase 9 Simple SCRUM Plan](./Phase-9-Implementation-epic.md)
## 🎯 **Overview**
Update our existing MCP endpoints to work with the multi-folder indexing system. The core change: **transition from single-folder to multi-folder operations** while maintaining the same excellent endpoint interface.
## 🌟 **The Problem**
Our MCP endpoints currently expect a single folder and model. With Task 11.5 complete, we now have:
- Multiple folders with different models
- Folder lifecycle states (pending → active → error)
- Path-aware topic detection (prerequisite)
**The endpoints need to be updated to work with this new reality.**
## 🎯 **Vision**
Keep our excellent 10-endpoint interface, but make each endpoint **folder-aware** and **state-aware**:
- LLMs specify which folder to operate on
- Endpoints validate folder states before operations
- Responses include folder context and attribution
- Model loading happens automatically per folder
---
## 📡 **Endpoint Changes Required**
### 1. **`search` Endpoint**
**Current**: Searches the single configured folder
**Required Changes**:
- **Add `folder` parameter (required)** - folder name or path
- **Validate folder is in 'active' state** before searching
- **Load correct model for the target folder**
- **Include folder attribution in results**
- **Return folder-specific error messages** when folder unavailable
**New Request Schema**:
```json
{
"query": "quadratic equations",
"folder": "MathCourse", // NEW: Required parameter
"mode": "semantic",
"limit": 10
}
```
**New Response Addition**:
```json
{
"folderContext": {
"name": "MathCourse",
"path": "/Users/bob/MathCourse",
"model": "all-MiniLM-L6-v2",
"status": "active"
}
// ... existing results structure
}
```
---
### 2. **`get_folder_info` Endpoint**
**Current**: Returns info about the single folder
**Required Changes**:
- **Return ALL configured folders** with their states and topics
- **Include topic clusters** generated by path-aware detection
- **Show folder lifecycle status** (pending, indexing, active, error)
- **Provide model information** and indexing progress
**Enhanced Response**:
```json
{
"folders": [
{
"name": "MathCourse",
"path": "/Users/bob/MathCourse",
"status": "active",
"model": "all-MiniLM-L6-v2",
"documentCount": 150,
"topics": [
{"weight": 45, "terms": ["algebra", "quadratic", "equations"]},
{"weight": 30, "terms": ["statistics", "probability"]}
],
"lastIndexed": "2024-01-15T10:30:00Z"
},
{
"name": "ChemistryCourse",
"status": "indexing",
"progress": 65,
"model": "sentence-transformers/all-mpnet-base-v2"
}
]
}
```
---
### 3. **`get_document_outline` Endpoint**
**Current**: Takes document ID from single folder
**Required Changes**:
- **Accept full document path** OR **folder + relative path**
- **Resolve which folder contains the document**
- **Validate folder is active** before processing
- **Include folder context** in response
---
### 4. **`get_document_data` Endpoint**
**Current**: Returns document from single folder
**Required Changes**:
- **Accept folder-aware document identification**
- **Validate folder accessibility**
- **Include folder attribution** in metadata
---
### 5. **`list_documents` Endpoint**
**Current**: Lists documents in the configured folder
**Required Changes**:
- **Add optional `folder` parameter** to list specific folder
- **Default to listing ALL folders** if no folder specified
- **Show folder context** for each document
- **Filter by folder status** (only show active folders by default)
---
### 6. **Document Format Endpoints** (`get_sheet_data`, `get_slides`, `get_pages`)
**Current**: Process documents from single folder
**Required Changes**:
- **Accept folder-aware document paths**
- **Validate folder states** before processing
- **Include folder context** in responses
- **Handle folder-specific processing errors**
---
### 7. **`get_status` Endpoint**
**Current**: Shows single folder status
**Required Changes**:
- **Show system-wide status** across all folders
- **Include model memory usage** and cache status
- **Report folder lifecycle states** and counts
- **Show topic detection health** and performance metrics
**Enhanced Response**:
```json
{
"systemStatus": "healthy",
"totalFolders": 5,
"activeFolders": 3,
"indexingFolders": 1,
"errorFolders": 1,
"modelCache": {
"loadedModels": ["all-MiniLM-L6-v2", "sentence-transformers/all-mpnet-base-v2"],
"memoryUsage": "2.1GB",
"cacheHits": 145,
"cacheMisses": 12
},
"topicDetection": {
"clustersGenerated": 47,
"averageCoherence": 0.82,
"recomputesPending": 2
}
}
```
---
## 🔧 **Core Infrastructure Changes**
### 1. **Folder Resolution Service**
- **Map folder names to paths** and validate existence
- **Check folder lifecycle states** before operations
- **Provide clear error messages** when folders unavailable
- **Handle folder aliases** and path normalization
### 2. **Model Loading Orchestration**
- **Load correct model per folder** for search operations
- **Cache 2-3 models** with LRU eviction
- **Handle model loading failures** gracefully
- **Report model switching performance**
### 3. **State Validation Middleware**
- **Block operations on non-active folders** with helpful errors
- **Allow read operations** on indexing folders (partial data)
- **Prevent operations** on error/pending folders
- **Provide status-specific guidance** to LLMs
### 4. **Response Enhancement**
- **Add folder context** to all responses
- **Include operation metadata** (model used, processing time)
- **Provide folder-specific errors** and suggestions
- **Enable result attribution** for LLM understanding
---
## 🎭 **Updated User Experience**
### For LLMs (Primary Users)
**Step 1: Discover Available Folders**
```
LLM calls: getFolderInfo()
LLM sees: 3 active folders with topic hints
```
**Step 2: Select Appropriate Folder**
```
User: "Find quadratic equations"
LLM: Sees MathCourse has topic "algebra, equations"
LLM calls: search({ folder: "MathCourse", query: "quadratic equations" })
```
**Step 3: Handle Folder Issues**
```
LLM calls: search({ folder: "ChemistryCourse", query: "..." })
Response: "Folder ChemistryCourse is currently indexing (65% complete)"
LLM: "The Chemistry folder is still being indexed. Try again in a few minutes?"
```
### For End Users
- **Transparent folder management** - LLM handles the complexity
- **Clear error messages** when folders unavailable
- **Automatic folder selection** based on topic matching
- **Progress visibility** for folders being indexed
---
## 🎯 **Success Criteria**
### Functional Requirements
- [ ] All 10 endpoints work with multi-folder setup
- [ ] Folder parameter required where appropriate
- [ ] State validation prevents invalid operations
- [ ] Model loading works per folder automatically
- [ ] Error messages are folder-aware and actionable
### Performance Requirements
- [ ] Search latency < 500ms including model loading
- [ ] Folder resolution < 10ms per request
- [ ] Model switching < 2s between folders
- [ ] State validation < 5ms overhead per request
### UX Requirements
- [ ] LLMs can successfully navigate multiple folders
- [ ] Error messages guide users to resolution
- [ ] Progress feedback for long-running operations
- [ ] Folder context clear in all responses
---
## 📋 **Implementation Phases** (Easiest to Hardest)
### Phase 1: MCP Server → Daemon Architecture Fix 🚨 *CRITICAL FOUNDATION*
**Goal**: Connect MCP server to daemon instead of direct file access
**The Problem**: Current MCP server takes folder path as argument and operates independently:
```
BROKEN: Claude → MCP Server → Direct File Access (single folder only)
```
**The Solution**: MCP server becomes a thin client connecting to daemon:
```
FIXED: Claude → MCP Server → WebSocket → Daemon → Multi-Folder System
```
**Specific Tasks**:
1. **Remove folder arguments** from MCP server entry point (`mcp-server.ts`)
2. **Add WebSocket client** to MCP server for daemon communication
3. **Create daemon API endpoints** for MCP operations
4. **Implement protocol translation** - MCP calls ↔ Daemon API calls
5. **Test architecture** - Verify MCP server connects to daemon successfully
**Architecture Verification Test**:
```bash
# 1. Start daemon (manages all folders)
npm run daemon
# 2. Start MCP server (no folder arguments!)
node dist/mcp-server.js # Connects to daemon via WebSocket
# 3. Test connection
Agent calls: get_status → MCP Server → Daemon → Returns system info
```
**Success Criteria**:
- MCP server starts WITHOUT folder path arguments
- WebSocket connection established to daemon
- MCP calls successfully forwarded to daemon
- Daemon responses properly returned to MCP clients
- UI folder changes immediately visible to MCP clients
### Phase 2: "Hello World" - Status Endpoint ⭐ *FIRST REAL TEST*
**Goal**: Get a single endpoint working with multi-folder data via daemon
**Why `get_status` first**:
- No folder parameters required - simplest interface change
- No model loading - no complex dependencies
- No state validation - just read current system state from daemon
- Perfect for testing MCP→Daemon→Multi-Folder flow
**Specific Tasks**:
1. **Implement daemon status API** - Return multi-folder system info
2. **Connect MCP endpoint to daemon** - Forward get_status calls
3. **Test with agent** - Verify endpoint responds with multi-folder data via daemon
4. **Validate JSON structure** - Ensure response format is correct
**"Hello World" Test Case**:
```bash
# Agent calls: get_status
# MCP Server → Daemon API → Multi-folder status
# Expected response includes:
{
"systemStatus": "healthy",
"totalFolders": 3,
"activeFolders": 2,
"indexingFolders": 1,
"errorFolders": 0
// ... basic multi-folder awareness from daemon
}
```
**Success Criteria**:
- Agent can successfully call `get_status` endpoint
- MCP server forwards call to daemon successfully
- Response includes multi-folder system information from daemon
- No crashes or MCP protocol errors
- Foundation established for other endpoints
### Phase 3: Read-Only Multi-Folder Endpoints
- **Enhance `get_folder_info`** - Forward to daemon for all folders with topics
- **Update `list_documents`** - Forward to daemon for folder-specific document lists
- **Test agent navigation** - Agent can discover folders and their contents via daemon
### Phase 4: Folder-Aware Document Retrieval
- **Update `get_document_outline`** - Forward to daemon with folder-aware document paths
- **Update `get_document_data`** - Forward to daemon with folder resolution
- **Update format endpoints** (`get_sheet_data`, `get_slides`, `get_pages`) - Forward to daemon
### Phase 5: Complex Search Endpoint
- **Add folder parameter to `search`** - Forward to daemon with folder specification
- **Daemon handles model switching** - Load correct model per folder in daemon
- **Daemon handles state validation** - Only search active folders
- **Performance optimization** - Model caching and efficient switching in daemon
### Phase 6: End-to-End Integration (Built Throughout)
- **Daemon API design** - RESTful/WebSocket API for all MCP operations
- **Error handling** - Proper error propagation from daemon to MCP clients
- **Performance monitoring** - Track MCP→Daemon→Response latency
- **Protocol compliance** - Ensure all MCP responses meet specification
---
## 🧪 **Verified TMOAT Methodology**
*Each implementation phase MUST be verified through subagent testing before moving to the next phase.*
### **Testing Infrastructure Setup**
**Step 1: Start Daemon + MCP Server Architecture**
```bash
# Build the project
npm run build
# 1. Start daemon first (manages all folders)
npm run daemon
# Daemon reads folders from ~/.folder-mcp/config.yaml
# 2. Configure Claude Code to use folder-mcp (NO folder arguments!)
# Add to Claude Code config:
{
"mcpServers": {
"folder-mcp": {
"command": "node",
"args": [
"/Users/hanan/Projects/folder-mcp/dist/mcp-server.js"
// ✅ NO folder paths! MCP server connects to daemon
]
}
}
}
# 3. MCP server automatically connects to daemon via WebSocket
# All folder configuration managed by daemon
```
**Step 2: Configure Test Folders in Daemon**
```bash
# Configure daemon with test folders (not MCP server!)
folder-mcp config set folders.list '[
{
"path": "tests/fixtures/test-knowledge-base/Sales",
"model": "all-MiniLM-L6-v2"
},
{
"path": "tests/fixtures/test-knowledge-base/Legal",
"model": "all-MiniLM-L6-v2"
}
]'
# Daemon immediately picks up config changes
# All MCP clients automatically see new folders
```
**Step 3: Create Testing Subagent**
```typescript
// Use Task tool to create a subagent with MCP access
const testAgent = await Task({
description: "MCP Endpoint Tester",
prompt: "You are testing the folder-mcp endpoints. Use the MCP tools to query the system.",
subagent_type: "general-purpose"
});
```
**Step 4: Verification Pattern**
- **Act as User**: You know what's in daemon config and test fixtures
- **Ask Known Questions**: Query for information you already know exists
- **Verify Responses**: Check if agent gets correct answers via MCP→Daemon flow
### **Phase 1 TMOAT: Architecture Connection Verification**
**Test Scenario 1: MCP Server Connects to Daemon**
```
USER (You): "Check if the MCP server is working"
EXPECTED: Agent calls get_status()
VERIFY:
- MCP server connects to daemon successfully
- No "connection refused" errors
- Response comes from daemon (not direct file access)
```
**Test Scenario 2: Daemon Configuration Visibility**
```
USER: "What folders are available?"
EXPECTED: Agent calls get_status() or get_folder_info()
VERIFY:
- Response shows folders from daemon config
- NOT hardcoded folder paths from MCP args
- Daemon manages folder list
```
### **Phase 2 TMOAT: Status Endpoint Verification**
**Test Scenario 1: Basic System Status**
```
USER (You): "Check the MCP server status"
EXPECTED: Agent calls get_status()
VERIFY: Response includes systemStatus, totalFolders, activeFolders
```
**Test Scenario 2: Multi-Folder Awareness**
```
USER: "How many folders are configured in the system?"
EXPECTED: Agent calls get_status()
VERIFY:
- totalFolders matches actual config count
- activeFolders shows correctly indexed folders
- errorFolders shows any failed folders
```
**Known Test Data** (from fixtures):
- Folders: `Sales/`, `Engineering/`, `Legal/`
- Total documents: ~15-20 files
- File types: PDF, DOCX, XLSX, PPTX
### **Phase 3 TMOAT: Folder Discovery (via Daemon)**
**Test Scenario 1: List All Folders from Daemon**
```
USER: "What folders are available to search?"
EXPECTED: Agent calls get_folder_info()
VERIFY:
- MCP Server → Daemon API → Folder list
- Returns Sales, Legal folders from daemon config
- Each folder shows document count from daemon
- Topics are displayed for each folder (from daemon's topic detection)
```
**Test Scenario 2: Document Listing via Daemon**
```
USER: "Show me all documents in the Sales folder"
EXPECTED: Agent calls list_documents(folder: "Sales")
VERIFY:
- MCP Server forwards request to Daemon
- Daemon returns Sales folder document list
- Lists Sales_Pipeline.xlsx, Q4_Board_Deck.pptx
- Shows correct file metadata from daemon
```
### **Phase 4 TMOAT: Document Retrieval (via Daemon)**
**Test Scenario 1: Get Specific Document via Daemon**
```
USER: "Get the outline of Q4_Board_Deck.pptx"
EXPECTED: Agent calls get_document_outline(document_id: "Sales/Q4_Board_Deck.pptx")
VERIFY:
- MCP Server forwards to Daemon document service
- Daemon processes document using its file parsing service
- Returns slide count (45 slides) from daemon
- Shows slide titles from daemon processing
- File size is correct from daemon metadata
```
**Test Scenario 2: Extract Spreadsheet Data via Daemon**
```
USER: "Show me the data from Sales_Pipeline.xlsx"
EXPECTED: Agent calls get_sheet_data(document_id: "Sales/Sales_Pipeline.xlsx")
VERIFY:
- MCP Server → Daemon API → Document processing
- Daemon parses spreadsheet using its services
- Returns sheet names from daemon
- Shows headers: Month, Revenue, Profit from daemon
- Contains Q4 data rows processed by daemon
```
### **Phase 5 TMOAT: Search Operations (via Daemon)**
**Test Scenario 1: Semantic Search Within Folder via Daemon**
```
USER: "Search for Q4 revenue information in the Sales folder"
EXPECTED: Agent calls search(query: "Q4 revenue", folder: "Sales", mode: "semantic")
VERIFY:
- MCP Server forwards to Daemon search service
- Daemon loads correct model for Sales folder
- Daemon performs semantic search in Sales folder only
- Returns Sales_Pipeline.xlsx as top result from daemon
- Returns Q4_Board_Deck.pptx as second result from daemon
- Similarity scores > 0.8 from daemon's model
```
**Test Scenario 2: Cross-Folder Awareness (Negative Test) via Daemon**
```
USER: "Search for contracts in the Sales folder"
EXPECTED: Agent calls search(query: "contracts", folder: "Sales", mode: "semantic")
VERIFY:
- MCP Server → Daemon search service
- Daemon searches ONLY Sales folder as requested
- Returns no results from daemon (contracts are in Legal folder)
- Agent suggests searching Legal folder instead
```
**Test Scenario 3: Dynamic Configuration Test**
```
USER: "Add a new folder via TUI, then search it"
STEPS:
1. Use TUI to add new folder to daemon config
2. Agent calls get_folder_info() immediately after
VERIFY:
- New folder appears in MCP client immediately
- No need to restart MCP server or update Claude config
- Daemon configuration changes propagate to all MCP clients
```
### **Success Criteria for Each Phase**
**Phase 1 Success (Architecture Fix)**:
✅ MCP server starts WITHOUT folder path arguments
✅ WebSocket connection established between MCP server and daemon
✅ MCP calls successfully forwarded to daemon
✅ Daemon responses properly returned to MCP clients
✅ UI folder changes immediately visible to MCP clients
**Phase 2 Success (Status Endpoint)**:
✅ Subagent successfully connects to MCP server
✅ MCP server forwards get_status calls to daemon
✅ Daemon returns multi-folder information via MCP server
✅ No protocol errors or crashes
✅ Response times < 500ms for MCP→Daemon→Response
**Phase 3 Success (Folder Discovery)**:
✅ Agent discovers all folders configured in daemon
✅ Document counts are accurate from daemon
✅ Topics reflect actual folder content from daemon's detection
✅ Can list documents in specific folders via daemon
**Phase 4 Success (Document Retrieval)**:
✅ Document outlines return correct structure from daemon
✅ Spreadsheet data is accessible via daemon processing
✅ PDF/PPTX metadata is accurate from daemon parsing
✅ Folder context included in responses from daemon
**Phase 5 Success (Search Operations)**:
✅ Semantic search returns relevant results from daemon
✅ Folder filtering works correctly in daemon
✅ Model switching happens transparently in daemon
✅ Performance < 2s for MCP→Daemon→Search→Response
✅ Dynamic folder configuration works without MCP server restart
### **TMOAT Execution Commands**
```bash
# 1. Build and start daemon + MCP server architecture
npm run build
# 2. Start daemon first (manages all folders)
npm run daemon
# Daemon loads config from ~/.folder-mcp/config.yaml
# 3. Configure test folders in daemon
folder-mcp config set folders.list '[
{
"path": "tests/fixtures/test-knowledge-base/Sales",
"model": "all-MiniLM-L6-v2"
}
]'
# 4. Start MCP server (connects to daemon, NO folder arguments)
node dist/mcp-server.js
# MCP server connects to daemon via WebSocket
# 5. Launch subagent tester
npm run test:mcp-agent
# 6. Monitor daemon and MCP communication
tail -f logs/daemon.log
tail -f logs/mcp-server.log
# 7. Verify each phase before proceeding
npm run verify:phase1 # Architecture connection works
npm run verify:phase2 # Status endpoint via daemon works
npm run verify:phase3 # Folder discovery via daemon works
# etc.
```
### **Red Flags to Watch For**
**Phase 1 Architecture Issues**:
⚠️ **Connection Failures**: MCP server can't connect to daemon WebSocket
⚠️ **Direct File Access**: MCP server bypassing daemon (old behavior)
⚠️ **Hardcoded Paths**: MCP server still expecting folder arguments
⚠️ **Port Conflicts**: Daemon and MCP server WebSocket port issues
**Phase 2+ Endpoint Issues**:
⚠️ **Protocol Violations**: Invalid JSON-RPC responses from MCP↔Daemon
⚠️ **Missing Folder Context**: Responses without folder attribution from daemon
⚠️ **Wrong Model Usage**: Daemon using single model for all folders
⚠️ **State Confusion**: Daemon allowing searches on inactive/error folders
⚠️ **Performance Issues**: MCP→Daemon→Response > 5 seconds
⚠️ **Memory Leaks**: Growing memory in daemon with repeated MCP calls
⚠️ **Config Sync Issues**: UI changes not propagating to MCP clients
⚠️ **Daemon Restart Required**: Configuration changes requiring daemon restarts
---
## 🧠 **Path-Aware Topic Detection Approach**
*This clustering enhancement is a prerequisite that must be implemented first to enable rich folder metadata for the endpoints.*
### Key Design Decisions
**Subfolder Intelligence**:
- Documents in same subfolder get 20% similarity boost during clustering
- Subfolder names weighted 5x in topic term extraction (e.g., "Algebra/" becomes primary cluster term)
- No database hierarchy storage - extract subfolder from document paths on-demand
**File Name Pattern Recognition**:
- Detect sequences (`HW1`, `HW2`, `HW3`), prefixes (`Algebra_`, `Statistics_`), categories (`exam`, `lab`, `notes`)
- Sequential files get 10% similarity boost, same prefix gets 20% boost
- Common file patterns become cluster terms with high weight
**Database-Powered Evolution**:
- Cache cluster centroids and term frequencies for fast incremental updates (~50ms classification)
- New topic detection: documents with <40% similarity queue for new cluster creation
- Quality monitoring: detect cluster drift, trigger recomputation when coherence drops below 0.7
- Evolution tracking: log when clusters split, merge, or adapt to content changes
**Performance Characteristics**:
- Document classification: <100ms using cached centroids
- New cluster creation: <5s from 5+ queued documents
- Topic retrieval: <50ms from database cache
- Full reclustering only when >20% of documents change or quality degrades
### Why This Matters for Endpoints
This approach generates **intelligent folder topics** that the `get_folder_info` endpoint will return to help LLMs make smart folder selection decisions. Topics like `["algebra", "quadratic", "homework"]` are much more useful than generic content clusters.
---
## ⚠️ **Prerequisites**
1. **Task 11.5 Complete**: Multi-folder indexing system working
2. **Path-Aware Topic Detection**: Implement clustering approach above
3. **Database Schema**: Cluster centroids, topic storage, and evolution tracking available
4. **Model Registry**: System tracks which model each folder uses
---
## 🎯 **Definition of Done**
- [ ] All 10 MCP endpoints support multi-folder operations
- [ ] Folder parameter added to search and document endpoints
- [ ] State validation prevents operations on unavailable folders
- [ ] Model loading works automatically per folder
- [ ] Responses include comprehensive folder context
- [ ] Error messages are folder-aware and actionable
- [ ] Performance meets targets with model switching
- [ ] LLMs can successfully navigate multi-folder scenarios
- [ ] All existing single-folder workflows still work
---
**Focus**: Transform existing endpoints to be folder-aware
**Deliverable**: Updated MCP endpoints that work seamlessly with multi-folder system
**Impact**: Enable LLMs to intelligently work with multiple knowledge bases