Kiwi MCP

kiwi-mcp
docs

RAG_INTEGRATION_COMPLETE.md•8.96 KiB

# RAG Integration Complete **Date:** 2026-01-23 **Status:** ✅ Complete **Implementation:** Mandatory RAG Configuration + Validation-to-Embedding Pipeline --- ## Overview The RAG (Retrieval-Augmented Generation) system is now fully integrated with mandatory environment variable configuration and automatic embedding on all create/update operations. --- ## What Was Implemented ### Phase 1: Mandatory RAG Configuration #### 1.1 Server Startup Validation (`kiwi_mcp/server.py`) - Added `validate_rag_config()` function that runs before MCP server initialization - Validates `EMBEDDING_URL` and `VECTOR_DB_URL` environment variables - Fails fast with helpful error message showing how to configure - Returns config dict with all RAG settings **Configuration Required:** ```bash # Via MCP client config (recommended) { "kiwi": { "command": "python", "args": ["-m", "kiwi_mcp"], "env": { "EMBEDDING_URL": "https://api.openai.com/v1/embeddings", "EMBEDDING_API_KEY": "sk-...", "EMBEDDING_MODEL": "text-embedding-3-small", "VECTOR_DB_URL": "postgresql://..." } } } # Or via environment variables export EMBEDDING_URL="https://api.openai.com/v1/embeddings" export EMBEDDING_API_KEY="sk-..." export EMBEDDING_MODEL="text-embedding-3-small" export VECTOR_DB_URL="postgresql://..." ``` #### 1.2 Simplified Config Loading (`kiwi_mcp/storage/vector/embedding_registry.py`) - `load_vector_config()` now prioritizes environment variables - Config file (`~/.ai/config/vector.yaml`) is optional fallback - Follows standard MCP pattern: config in MCP client, not files #### 1.3 Setup Script Updated (`.ai/scripts/setup_vector_config.py`) - Changed from creating config file to outputting MCP client configuration - Interactive wizard generates JSON config for user to paste - Shows exact configuration needed for MCP client ### Phase 2: Database Schema Updates #### 2.1 Migration SQL (`docs/migrations/supabase_vector_embeddings.sql`) - Updated `item_type` constraint to include `'tool'`: ```sql item_type TEXT NOT NULL CHECK (item_type IN ('directive', 'tool', 'knowledge')) ``` ### Phase 3: Validation-to-Embedding Integration #### 3.1 ValidationManager Enhancement (`kiwi_mcp/utils/validators.py`) **New methods:** - `validate_and_embed()` - Validates content and embeds if valid (validation gate) - `_extract_searchable()` - Extracts searchable content by item type **Content extraction:** - **Directives**: Name, description, step descriptions - **Tools**: Name, description, docstrings, function names - **Knowledge**: Title and full content #### 3.2 Handler Integration All three handlers updated with automatic embedding: **DirectiveHandler** (`kiwi_mcp/handlers/directive/handler.py`): - Added `_init_vector_store()` method - Replaced 2 `ValidationManager.validate()` calls with `validate_and_embed()` - Creates embeddings on directive create/update **ToolHandler** (`kiwi_mcp/handlers/tool/handler.py`): - Added `_init_vector_store()` method - Replaced `ValidationManager.validate()` call with `validate_and_embed()` - Creates embeddings on tool validation **KnowledgeHandler** (`kiwi_mcp/handlers/knowledge/handler.py`): - Added `_init_vector_store()` method - Replaced 2 `ValidationManager.validate()` calls with `validate_and_embed()` - Creates embeddings on knowledge entry create/update ### Phase 4: SearchTool Integration #### 4.1 SearchTool Enhancement (`kiwi_mcp/tools/search.py`) - Updated `_setup_vector_search()` to use `EmbeddingService` from RAG config - Loads config via `load_vector_config()` (validated at startup) - Uses API-based embeddings instead of local model ### Additional Fixes #### Vector Store Async Support Updated both `LocalVectorStore` and `RegistryVectorStore` to handle both: - Sync embedders (`EmbeddingModel` - local sentence-transformers) - Async embedders (`EmbeddingService` - API-based) **LocalVectorStore** (`kiwi_mcp/storage/vector/local.py`): - Added `embedding_service` parameter - Auto-detects sync vs async embedders using `inspect.iscoroutinefunction()` - Handles both in `embed_and_store()` and `search()` methods **RegistryVectorStore** (`kiwi_mcp/storage/vector/registry.py`): - Added `embedding_service` parameter - Auto-detects sync vs async embedders - Handles both in `embed_and_store()` and `search()` methods **EmbeddingService** (`kiwi_mcp/storage/vector/api_embeddings.py`): - Added `embed()` alias method for compatibility with vector stores --- ## Architecture Flow ``` MCP Server Start ↓ validate_rag_config() [MANDATORY - fails if missing] ↓ Environment variables checked: - EMBEDDING_URL (required) - VECTOR_DB_URL (required) - EMBEDDING_API_KEY (optional) - EMBEDDING_MODEL (optional, default: text-embedding-3-small) ↓ Server initializes with validated config ↓ User Operation (create/update directive/tool/knowledge) ↓ Handler validates content ↓ ValidationManager.validate_and_embed() ↓ If valid → Extract searchable content → Embed to vector store ↓ Vector stores: - Project: .ai/vector/project/ - User: ~/.ai/vector/user/ (on sync) - Registry: Supabase pgvector (on publish) ↓ Search operations use hybrid search (semantic + keyword) ``` --- ## Configuration Hierarchy 1. **MCP Client Config** (highest priority) - Set in Cursor/Claude MCP settings - Standard MCP pattern 2. **Environment Variables** (high priority) - Direct environment variables - Works for standalone/testing 3. **Config File** (optional fallback) - `~/.ai/config/vector.yaml` - Only used if env vars not set --- ## Testing the Implementation ### 1. Test MCP Startup Validation ```bash # Should fail without config unset EMBEDDING_URL VECTOR_DB_URL python -m kiwi_mcp # Should succeed with config export EMBEDDING_URL="https://api.openai.com/v1/embeddings" export EMBEDDING_API_KEY="sk-..." export VECTOR_DB_URL="postgresql://..." python -m kiwi_mcp ``` ### 2. Test Automatic Embedding Create a directive and verify it gets embedded: ```python # Execute create action result = await execute( item_type="directive", action="create", item_id="test_directive", parameters={"location": "project"}, project_path="/path/to/project" ) # Check embedding was created # Should see: result["embedded"] = True in validation result ``` ### 3. Test Vector Search ```python # Search should use vector search if configured result = await search( item_type="directive", query="deploy application", project_path="/path/to/project" ) # Should return: search_type="vector_hybrid" ``` --- ## Files Modified | File | Changes | |------|---------| | `kiwi_mcp/server.py` | Added `validate_rag_config()`, mandatory validation in `__init__` | | `kiwi_mcp/storage/vector/embedding_registry.py` | Prioritize env vars over config file | | `.ai/scripts/setup_vector_config.py` | Output MCP client config instead of creating file | | `docs/migrations/supabase_vector_embeddings.sql` | Added 'tool' to item_type enum | | `kiwi_mcp/utils/validators.py` | Added `validate_and_embed()` and `_extract_searchable()` | | `kiwi_mcp/handlers/directive/handler.py` | Added vector store init, 2 validate_and_embed calls | | `kiwi_mcp/handlers/tool/handler.py` | Added vector store init, 1 validate_and_embed call | | `kiwi_mcp/handlers/knowledge/handler.py` | Added vector store init, 2 validate_and_embed calls | | `kiwi_mcp/tools/search.py` | Use EmbeddingService from RAG config | | `kiwi_mcp/storage/vector/local.py` | Support both sync/async embedders | | `kiwi_mcp/storage/vector/registry.py` | Support both sync/async embedders | | `kiwi_mcp/storage/vector/api_embeddings.py` | Added `embed()` alias method | --- ## Success Criteria - ✅ MCP server validates RAG config at startup - ✅ Server fails fast with helpful error if config missing - ✅ Config prioritizes environment variables (standard MCP pattern) - ✅ Automatic embedding on all create/update operations - ✅ Validation gate: only valid content gets embedded - ✅ Database schema supports all item types (directive, tool, knowledge) - ✅ Search uses configured embedding service - ✅ Three-tier storage (project/user/registry) works --- ## Next Steps 1. **Run the setup script** to generate MCP client config: ```bash python .ai/scripts/setup_vector_config.py ``` 2. **Add config to MCP client** (Cursor/Claude settings) 3. **Apply database migration** (if not already done): ```sql -- Run: docs/migrations/supabase_vector_embeddings.sql ``` 4. **Test the integration**: - Create a directive → verify embedding created - Search for directives → verify vector search works - Check `.ai/vector/project/` directory created --- ## Related Documents - [RAG_VECTOR_SEARCH_DESIGN.md](./RAG_VECTOR_SEARCH_DESIGN.md) - Original RAG design - [PHASE_6_STATUS_REPORT.md](./PHASE_6_STATUS_REPORT.md) - Implementation status before this - [FLEXIBLE_VECTOR_CONFIG_DESIGN.md](../FLEXIBLE_VECTOR_CONFIG_DESIGN.md) - Config hierarchy design - [UNIFIED_TOOLS_IMPLEMENTATION_PLAN.md](./UNIFIED_TOOLS_IMPLEMENTATION_PLAN.md) - Tools migration plan

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolilley/kiwi-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

RAG_INTEGRATION_COMPLETE.md•8.96 KiB