Kiwi MCP

kiwi-mcp
docs

DATABASE_EVOLUTION_DESIGN.md•35.6 KiB

# Database Evolution: Scripts → Tools **Date:** 2026-01-23 **Status:** Draft **Related:** [KIWI_HARNESS_ROADMAP.md](./KIWI_HARNESS_ROADMAP.md), [TOOLS_EVOLUTION_PROPOSAL.md](./TOOLS_EVOLUTION_PROPOSAL.md), [DYNAMIC_TOOLS_ARCHITECTURE.md](./DYNAMIC_TOOLS_ARCHITECTURE.md) --- ## Executive Summary The move from "scripts" to "tools" requires database evolution. Tools are a superset: Python (legacy scripts), Bash, API, Docker, and MCP proxies. This document outlines the schema changes needed for Supabase. **Key Decision:** Create new `tools` tables, migrate existing data, and keep `scripts` as views for backward compatibility. --- ## Core Architectural Principle: Tools Are Data, Not Code ### The Key Insight **Executors are primitives (code). Tools are definitions (data).** | Layer | What It Is | Where It Lives | Who Creates It | | ------------- | ---------------------------------------- | ------------------------------------------------- | ------------------ | | **Executors** | Runtime engines (Python, Bash, API, MCP) | Hard-coded in `kiwi_mcp/handlers/tool/executors/` | Developers | | **Tools** | Declarative definitions + implementation | Registry (`tools` table) + files | LLMs or Developers | This means: - We have ~5 executor types (the primitives) - We can have unlimited tools (data in registry) - **LLMs can create new tools dynamically** by providing manifest + implementation - No code changes needed to add a new tool ### How MCP Fits In An MCP tool is just a **routing definition** stored in the registry: ```yaml # Example: MCP tool stored in tools table tool_id: supabase_execute_sql tool_type: mcp # Routes to MCPExecutor executor_config: mcp_server: supabase # Which MCP server to connect to mcp_tool: execute_sql # Which tool on that server ``` When called: 1. Look up tool in registry → `tool_type: mcp` 2. Route to `MCPExecutor` 3. MCPExecutor connects to Supabase MCP server 4. Proxies the call, returns result **MCP server configs also become data** in an `mcp_servers` table, not hard-coded in Python. ### Dynamic Tool Creation Flow ``` LLM needs a new capability │ ▼ ┌─────────────────────────────┐ │ Search registry for tool │──── Found? ──► Use it └─────────────────────────────┘ │ Not found ▼ ┌─────────────────────────────┐ │ LLM creates tool: │ │ - Writes manifest (YAML) │ │ - Writes implementation │ │ (Python/Bash/API config) │ │ - Calls execute(create) │ └─────────────────────────────┘ │ ▼ ┌─────────────────────────────┐ │ Tool saved to registry │ │ - tools table (metadata) │ │ - tool_versions (manifest) │ │ - tool_version_files (code) │ └─────────────────────────────┘ │ ▼ Any agent can now use it ``` ### What's Static vs Dynamic | Component | Static (Code) | Dynamic (Registry) | | -------------------- | :-----------: | :---------------------: | | PythonExecutor | ✅ | | | BashExecutor | ✅ | | | APIExecutor | ✅ | | | MCPExecutor | ✅ | | | DockerExecutor | ✅ | | | MCP server configs | | ✅ `mcp_servers` table | | Individual tools | | ✅ `tools` table | | Tool implementations | | ✅ `tool_version_files` | ### Example: Tool Type Matrix | tool_type | Executor | What's in Registry | LLM Can Create? | | --------- | -------------- | ---------------------------- | :-------------: | | `python` | PythonExecutor | Manifest + `.py` files | ✅ | | `bash` | BashExecutor | Manifest + `.sh` files | ✅ | | `api` | APIExecutor | Manifest only (API config) | ✅ | | `docker` | DockerExecutor | Manifest (image + config) | ✅ | | `mcp` | MCPExecutor | Manifest (server + tool ref) | ✅ | --- ## New Tables Summary | Table | Purpose | | ---------------------- | -------------------------------------- | | `tools` | Tool metadata (replaces `scripts`) | | `tool_versions` | Versioned manifests (JSONB) | | `tool_version_files` | Multi-file support (code, configs) | | `mcp_servers` | Reusable MCP server connection configs | | `directive_embeddings` | Vector + FTS for directives | | `tool_embeddings` | Vector + FTS for tools | | `knowledge_embeddings` | Vector + FTS for knowledge | --- ## Current State ### Existing Tables | Table | Purpose | | ------------------------- | --------------------------------------------------------------- | | `scripts` | Script metadata (name, category, description, tech_stack, tags) | | `script_versions` | Version content (version, content, content_hash) | | `directives` | Directive metadata | | `directive_versions` | Directive version content | | `knowledge_entries` | Knowledge zettel entries | | `knowledge_relationships` | Graph relationships | | `item_embeddings` | Vector embeddings (item_type: directive, script, knowledge) | ### Current Limitations 1. **Single file per script** - No support for multi-file tools (Python + requirements.txt + tool.yaml) 2. **Python-only** - No `tool_type` to distinguish bash/api/docker/mcp 3. **No manifest storage** - tool.yaml manifests not versioned 4. **Blob in DB** - Large scripts stored as text in Postgres --- ## Proposed Schema ### New Tables #### `tools` (replaces `scripts` as canonical entity) ```sql CREATE TABLE tools ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tool_id TEXT NOT NULL, -- stable identifier (e.g., "lint_python") namespace TEXT DEFAULT 'public', -- for future multi-tenancy name TEXT, -- display name tool_type TEXT NOT NULL, -- python | bash | api | docker | mcp description TEXT, category TEXT, subcategory TEXT, tags TEXT[], tech_stack TEXT[], is_official BOOLEAN DEFAULT false, visibility TEXT DEFAULT 'public', -- public | unlisted | private download_count INTEGER DEFAULT 0, quality_score NUMERIC(3,2), success_rate NUMERIC(3,2), estimated_cost_usd NUMERIC(10,2), latest_version TEXT, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), CONSTRAINT tools_tool_type_check CHECK (tool_type IN ('python', 'bash', 'api', 'docker', 'mcp')), CONSTRAINT tools_visibility_check CHECK (visibility IN ('public', 'unlisted', 'private')), UNIQUE (namespace, tool_id) ); CREATE INDEX idx_tools_tool_id ON tools (tool_id); CREATE INDEX idx_tools_category ON tools (category); CREATE INDEX idx_tools_tool_type ON tools (tool_type); ``` #### `tool_versions` (replaces `script_versions`) ```sql CREATE TABLE tool_versions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tool_id UUID NOT NULL REFERENCES tools(id) ON DELETE CASCADE, version TEXT NOT NULL, -- semver string manifest JSONB NOT NULL, -- parsed tool.yaml manifest_yaml TEXT, -- original YAML for round-trip fidelity content_hash TEXT NOT NULL, -- hash of canonical artifact changelog TEXT, is_latest BOOLEAN DEFAULT false, published_at TIMESTAMPTZ DEFAULT NOW(), created_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE (tool_id, version) ); CREATE INDEX idx_tool_versions_tool_id ON tool_versions (tool_id); CREATE INDEX idx_tool_versions_is_latest ON tool_versions (is_latest) WHERE is_latest = true; ``` #### `tool_version_files` (supports multi-file tools) ```sql CREATE TABLE tool_version_files ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tool_version_id UUID NOT NULL REFERENCES tool_versions(id) ON DELETE CASCADE, path TEXT NOT NULL, -- e.g., 'tool.yaml', 'main.py', 'requirements.txt' media_type TEXT, -- e.g., 'text/x-python', 'text/plain' sha256 TEXT NOT NULL, size_bytes INTEGER, is_executable BOOLEAN DEFAULT false, content_text TEXT, -- for small text files (< 64KB) storage_key TEXT, -- Supabase Storage key for larger files created_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE (tool_version_id, path) ); CREATE INDEX idx_tool_version_files_version ON tool_version_files (tool_version_id); ``` ### Updated `item_embeddings` ```sql -- Add 'tool' to allowed item types ALTER TABLE item_embeddings DROP CONSTRAINT item_embeddings_item_type_check; ALTER TABLE item_embeddings ADD CONSTRAINT item_embeddings_item_type_check CHECK (item_type IN ('directive', 'script', 'tool', 'knowledge')); ``` --- ## Migration Strategy ### Phase 1: Create New Tables (Non-Breaking) 1. Create `tools`, `tool_versions`, `tool_version_files` tables 2. Create Supabase Storage bucket `tool-artifacts` 3. Update `item_embeddings` constraint to include `'tool'` ### Phase 2: Migrate Existing Data ```sql -- Migrate scripts → tools INSERT INTO tools ( id, tool_id, namespace, name, tool_type, description, category, subcategory, tags, tech_stack, is_official, download_count, quality_score, success_rate, estimated_cost_usd, latest_version, created_at, updated_at ) SELECT id, name AS tool_id, -- use script name as tool_id 'public' AS namespace, name, 'python' AS tool_type, -- all existing scripts are Python description, category, subcategory, tags, tech_stack, is_official, download_count, quality_score, success_rate, estimated_cost_usd, latest_version, created_at, updated_at FROM scripts; -- Migrate script_versions → tool_versions + tool_version_files INSERT INTO tool_versions ( id, tool_id, version, manifest, manifest_yaml, content_hash, changelog, is_latest, created_at ) SELECT sv.id, sv.script_id AS tool_id, sv.version, jsonb_build_object( 'tool_id', s.name, 'tool_type', 'python', 'version', sv.version, 'description', s.description, 'executor_config', jsonb_build_object('entrypoint', 'main.py') ) AS manifest, NULL AS manifest_yaml, -- no original YAML for legacy scripts sv.content_hash, sv.changelog, sv.version = s.latest_version AS is_latest, sv.created_at FROM script_versions sv JOIN scripts s ON sv.script_id = s.id; -- Create file entries for script content INSERT INTO tool_version_files ( tool_version_id, path, media_type, sha256, size_bytes, is_executable, content_text ) SELECT tv.id AS tool_version_id, 'main.py' AS path, 'text/x-python' AS media_type, sv.content_hash AS sha256, LENGTH(sv.content) AS size_bytes, true AS is_executable, sv.content AS content_text FROM script_versions sv JOIN tool_versions tv ON sv.id = tv.id; ``` ### Phase 3: Create Compatibility Views ```sql -- Legacy scripts view (for backward compatibility) CREATE OR REPLACE VIEW scripts_compat AS SELECT id, tool_id AS name, category, subcategory, description, is_official, download_count, quality_score, tech_stack, tags, success_rate, estimated_cost_usd, latest_version, created_at, updated_at FROM tools WHERE tool_type = 'python'; -- Legacy script_versions view CREATE OR REPLACE VIEW script_versions_compat AS SELECT tv.id, tv.tool_id AS script_id, tv.version, tvf.content_text AS content, tv.content_hash, tv.changelog, tv.created_at FROM tool_versions tv LEFT JOIN tool_version_files tvf ON tv.id = tvf.tool_version_id AND tvf.path = 'main.py' JOIN tools t ON tv.tool_id = t.id WHERE t.tool_type = 'python'; ``` ### Phase 4: Update Registry Classes 1. Create `ToolRegistry` class (new) 2. Update `ScriptRegistry` to use views or proxy to `ToolRegistry` 3. Add `ToolResolver` for local `.ai/scripts/` (or `.ai/tools/`) resolution --- ## Local Storage Structure ### Current: `.ai/scripts/` ``` .ai/scripts/ ├── category/ │ └── script_name.py # Single Python file ``` ### Proposed: Keep `.ai/scripts/` but support multi-file ``` .ai/scripts/ ├── category/ │ ├── python_script.py # Legacy single-file (virtual manifest) │ ├── bash_tool/ # Directory for multi-file tools │ │ ├── tool.yaml # Tool manifest │ │ └── script.sh # Entrypoint │ └── api_tool/ │ └── tool.yaml # API tools are manifest-only ``` **Resolution Logic:** 1. If `name.py` exists → Python tool (virtual manifest) 2. If `name/tool.yaml` exists → Load manifest, use specified executor 3. If `name/` directory without manifest → Error **Future Option:** Rename to `.ai/tools/` (Phase 2, breaking change with migration) --- ## Multi-File Tool Publishing When publishing a multi-file tool to registry: ```python async def publish_tool(tool_path: Path, version: str) -> Dict[str, Any]: # 1. Load and validate manifest manifest = ToolManifest.from_yaml(tool_path / "tool.yaml") # 2. Collect all files files = [] for file in tool_path.rglob("*"): if file.is_file(): files.append({ "path": str(file.relative_to(tool_path)), "content": file.read_bytes(), "sha256": hash_file(file) }) # 3. Create bundle (optional but recommended) bundle = create_zip_bundle(tool_path) bundle_key = f"tools/{manifest.tool_id}/{version}/bundle.zip" await storage.upload(bundle_key, bundle) # 4. Insert tool + version + files tool_id = await upsert_tool(manifest) version_id = await insert_version(tool_id, version, manifest) for file in files: if len(file["content"]) < 65536: # < 64KB await insert_file(version_id, file["path"], content_text=file["content"]) else: storage_key = f"tools/{manifest.tool_id}/{version}/{file['path']}" await storage.upload(storage_key, file["content"]) await insert_file(version_id, file["path"], storage_key=storage_key) return {"tool_id": manifest.tool_id, "version": version} ``` --- ## RAG/Embeddings Considerations ### What to Embed For tools, embed a composite of: - Description - Parameter names and descriptions - Tags and category - Example usage (from manifest) ```python def generate_tool_embedding_content(tool: Tool, manifest: ToolManifest) -> str: parts = [ f"Tool: {tool.name}", f"Type: {manifest.tool_type}", f"Description: {tool.description or ''}", f"Category: {tool.category or ''}", f"Tags: {', '.join(tool.tags or [])}", ] # Add parameter info for param in manifest.parameters or []: parts.append(f"Parameter {param['name']}: {param.get('description', '')}") return "\n".join(parts) ``` ### Embedding Strategy **Embed latest version only** (initially): - Avoids uniqueness constraint issues in `item_embeddings` - `item_type = 'tool'`, `item_id = tool_id` or `tool:<namespace>/<tool_id>` - Re-embed when new version published **Future: Version-level embeddings** (if needed): - Change `item_embeddings.item_id` to non-unique or composite - `item_id = tool:<namespace>/<tool_id>@<version>` --- ## Migration Checklist ### Database Changes - [ ] Create `tools` table with `tool_type` column - [ ] Create `tool_versions` table with manifest JSONB - [ ] Create `tool_version_files` table for multi-file support - [ ] Create Supabase Storage bucket `tool-artifacts` - [ ] Migrate existing scripts → tools (tool_type='python') - [ ] Create compatibility views for `scripts` / `script_versions` - [ ] Update `item_embeddings` check constraint ### Code Changes - [ ] Create `ToolRegistry` class - [ ] Update `ScriptRegistry` to use compat views or delegate - [ ] Update `ToolHandler._publish_tool` to use new schema - [ ] Update `RegistryVectorStore` to handle `item_type='tool'` - [ ] Add multi-file bundle support to publish flow ### Local Storage - [ ] Update `ToolResolver` to support directory-based tools - [ ] Support `tool.yaml` manifest loading - [ ] Virtual manifest generation for legacy `.py` files --- ## Open Questions 1. **Namespace Strategy**: Should we implement namespaces now (e.g., `user/tool_id`) or defer? - Recommendation: Add column now, default to `'public'`, implement later 2. **Directory Rename**: Should `.ai/scripts/` become `.ai/tools/`? - Recommendation: Keep `scripts` for now, support both in resolver 3. **Bundle Format**: ZIP vs tarball for multi-file tools? - Recommendation: ZIP (universal support, Supabase Storage friendly) 4. **Content Storage Threshold**: At what size should files go to Storage vs DB? - Recommendation: 64KB (matches typical script sizes, keeps small tools fast) --- ## RAG & Vector Search Schema (Supabase Best Practices) This section extends the schema to support semantic search following Supabase's recommended patterns for pgvector. ### Key Design Decisions 1. **HNSW over IVFFlat** - Better performance for our scale, no training required 2. **Hybrid Search (RRF)** - Combine semantic + full-text for best results 3. **Generated FTS columns** - Auto-maintained tsvector for keyword search 4. **Per-type tables** - Separate embedding tables for directives, tools, knowledge (better indexing) ### Current vs Proposed | Aspect | Current (`item_embeddings`) | Proposed | | ------------ | --------------------------- | ------------------------------- | | Single table | All item types mixed | Per-type tables (better perf) | | Index type | IVFFlat (lists=100) | HNSW (m=16, ef=64) | | Full-text | None | Generated tsvector + GIN | | Search | Vector-only | Hybrid RRF (semantic + keyword) | | Dimension | 384 (MiniLM) | 384 or 512 (configurable) | ### New Embedding Tables #### `directive_embeddings` ```sql -- Drop old generic table (after migration) -- DROP TABLE item_embeddings; CREATE TABLE directive_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), directive_id UUID NOT NULL REFERENCES directives(id) ON DELETE CASCADE, version TEXT, -- NULL = latest, or specific version -- Vector embedding embedding vector(384) NOT NULL, -- Full-text search (generated column) content TEXT NOT NULL, -- Combined searchable text fts tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(name, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(content, '')), 'C') ) STORED, -- Metadata for filtering name TEXT, description TEXT, category TEXT, tags TEXT[], -- Audit fields signature TEXT, validated_at TIMESTAMPTZ DEFAULT NOW(), created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE (directive_id, version) ); -- HNSW index for semantic search (inner product for normalized embeddings) CREATE INDEX directive_embeddings_hnsw ON directive_embeddings USING hnsw (embedding vector_ip_ops) WITH (m = 16, ef_construction = 64); -- GIN index for full-text search CREATE INDEX directive_embeddings_fts ON directive_embeddings USING gin(fts); -- Category filter index CREATE INDEX directive_embeddings_category ON directive_embeddings (category); ``` #### `tool_embeddings` ```sql CREATE TABLE tool_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), tool_id UUID NOT NULL REFERENCES tools(id) ON DELETE CASCADE, version TEXT, -- NULL = latest -- Vector embedding embedding vector(384) NOT NULL, -- Full-text search content TEXT NOT NULL, fts tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(name, '')), 'A') || setweight(to_tsvector('english', coalesce(description, '')), 'B') || setweight(to_tsvector('english', coalesce(content, '')), 'C') ) STORED, -- Metadata name TEXT, description TEXT, tool_type TEXT, -- python, bash, api, docker, mcp category TEXT, tags TEXT[], parameters JSONB, -- For parameter-based search -- Audit signature TEXT, validated_at TIMESTAMPTZ DEFAULT NOW(), created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE (tool_id, version) ); -- Indexes CREATE INDEX tool_embeddings_hnsw ON tool_embeddings USING hnsw (embedding vector_ip_ops) WITH (m = 16, ef_construction = 64); CREATE INDEX tool_embeddings_fts ON tool_embeddings USING gin(fts); CREATE INDEX tool_embeddings_category ON tool_embeddings (category); CREATE INDEX tool_embeddings_tool_type ON tool_embeddings (tool_type); ``` #### `knowledge_embeddings` ```sql CREATE TABLE knowledge_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), knowledge_id UUID NOT NULL REFERENCES knowledge_entries(id) ON DELETE CASCADE, -- Vector embedding embedding vector(384) NOT NULL, -- Full-text search content TEXT NOT NULL, fts tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(content, '')), 'B') ) STORED, -- Metadata title TEXT, entry_type TEXT, -- concept, pattern, learning, etc. tags TEXT[], -- Audit validated_at TIMESTAMPTZ DEFAULT NOW(), created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE (knowledge_id) ); -- Indexes CREATE INDEX knowledge_embeddings_hnsw ON knowledge_embeddings USING hnsw (embedding vector_ip_ops) WITH (m = 16, ef_construction = 64); CREATE INDEX knowledge_embeddings_fts ON knowledge_embeddings USING gin(fts); CREATE INDEX knowledge_embeddings_entry_type ON knowledge_embeddings (entry_type); ``` ### Hybrid Search Functions (RRF) #### Search Directives ```sql CREATE OR REPLACE FUNCTION search_directives_hybrid( query_text TEXT, query_embedding vector(384), match_count INT DEFAULT 20, full_text_weight FLOAT DEFAULT 1.0, semantic_weight FLOAT DEFAULT 1.0, rrf_k INT DEFAULT 60, filter_category TEXT DEFAULT NULL ) RETURNS TABLE ( directive_id UUID, name TEXT, description TEXT, category TEXT, score FLOAT, match_type TEXT -- 'semantic', 'keyword', or 'both' ) LANGUAGE sql AS $$ WITH semantic AS ( SELECT de.directive_id, de.name, de.description, de.category, ROW_NUMBER() OVER (ORDER BY de.embedding <#> query_embedding) AS rank_ix FROM directive_embeddings de WHERE (filter_category IS NULL OR de.category = filter_category) ORDER BY de.embedding <#> query_embedding LIMIT match_count * 2 ), full_text AS ( SELECT de.directive_id, de.name, de.description, de.category, ROW_NUMBER() OVER (ORDER BY ts_rank_cd(de.fts, websearch_to_tsquery('english', query_text)) DESC) AS rank_ix FROM directive_embeddings de WHERE de.fts @@ websearch_to_tsquery('english', query_text) AND (filter_category IS NULL OR de.category = filter_category) ORDER BY rank_ix LIMIT match_count * 2 ) SELECT COALESCE(s.directive_id, f.directive_id) AS directive_id, COALESCE(s.name, f.name) AS name, COALESCE(s.description, f.description) AS description, COALESCE(s.category, f.category) AS category, ( COALESCE(1.0 / (rrf_k + f.rank_ix), 0.0) * full_text_weight + COALESCE(1.0 / (rrf_k + s.rank_ix), 0.0) * semantic_weight ) AS score, CASE WHEN s.directive_id IS NOT NULL AND f.directive_id IS NOT NULL THEN 'both' WHEN f.directive_id IS NOT NULL THEN 'keyword' ELSE 'semantic' END AS match_type FROM semantic s FULL OUTER JOIN full_text f ON s.directive_id = f.directive_id ORDER BY score DESC LIMIT match_count; $$; ``` #### Search Tools ```sql CREATE OR REPLACE FUNCTION search_tools_hybrid( query_text TEXT, query_embedding vector(384), match_count INT DEFAULT 20, full_text_weight FLOAT DEFAULT 1.0, semantic_weight FLOAT DEFAULT 1.0, rrf_k INT DEFAULT 60, filter_category TEXT DEFAULT NULL, filter_tool_type TEXT DEFAULT NULL ) RETURNS TABLE ( tool_id UUID, name TEXT, description TEXT, tool_type TEXT, category TEXT, score FLOAT, match_type TEXT ) LANGUAGE sql AS $$ WITH semantic AS ( SELECT te.tool_id, te.name, te.description, te.tool_type, te.category, ROW_NUMBER() OVER (ORDER BY te.embedding <#> query_embedding) AS rank_ix FROM tool_embeddings te WHERE (filter_category IS NULL OR te.category = filter_category) AND (filter_tool_type IS NULL OR te.tool_type = filter_tool_type) ORDER BY te.embedding <#> query_embedding LIMIT match_count * 2 ), full_text AS ( SELECT te.tool_id, te.name, te.description, te.tool_type, te.category, ROW_NUMBER() OVER (ORDER BY ts_rank_cd(te.fts, websearch_to_tsquery('english', query_text)) DESC) AS rank_ix FROM tool_embeddings te WHERE te.fts @@ websearch_to_tsquery('english', query_text) AND (filter_category IS NULL OR te.category = filter_category) AND (filter_tool_type IS NULL OR te.tool_type = filter_tool_type) ORDER BY rank_ix LIMIT match_count * 2 ) SELECT COALESCE(s.tool_id, f.tool_id) AS tool_id, COALESCE(s.name, f.name) AS name, COALESCE(s.description, f.description) AS description, COALESCE(s.tool_type, f.tool_type) AS tool_type, COALESCE(s.category, f.category) AS category, ( COALESCE(1.0 / (rrf_k + f.rank_ix), 0.0) * full_text_weight + COALESCE(1.0 / (rrf_k + s.rank_ix), 0.0) * semantic_weight ) AS score, CASE WHEN s.tool_id IS NOT NULL AND f.tool_id IS NOT NULL THEN 'both' WHEN f.tool_id IS NOT NULL THEN 'keyword' ELSE 'semantic' END AS match_type FROM semantic s FULL OUTER JOIN full_text f ON s.tool_id = f.tool_id ORDER BY score DESC LIMIT match_count; $$; ``` #### Search Knowledge ```sql CREATE OR REPLACE FUNCTION search_knowledge_hybrid( query_text TEXT, query_embedding vector(384), match_count INT DEFAULT 20, full_text_weight FLOAT DEFAULT 1.0, semantic_weight FLOAT DEFAULT 1.0, rrf_k INT DEFAULT 60, filter_entry_type TEXT DEFAULT NULL ) RETURNS TABLE ( knowledge_id UUID, title TEXT, entry_type TEXT, score FLOAT, match_type TEXT ) LANGUAGE sql AS $$ WITH semantic AS ( SELECT ke.knowledge_id, ke.title, ke.entry_type, ROW_NUMBER() OVER (ORDER BY ke.embedding <#> query_embedding) AS rank_ix FROM knowledge_embeddings ke WHERE (filter_entry_type IS NULL OR ke.entry_type = filter_entry_type) ORDER BY ke.embedding <#> query_embedding LIMIT match_count * 2 ), full_text AS ( SELECT ke.knowledge_id, ke.title, ke.entry_type, ROW_NUMBER() OVER (ORDER BY ts_rank_cd(ke.fts, websearch_to_tsquery('english', query_text)) DESC) AS rank_ix FROM knowledge_embeddings ke WHERE ke.fts @@ websearch_to_tsquery('english', query_text) AND (filter_entry_type IS NULL OR ke.entry_type = filter_entry_type) ORDER BY rank_ix LIMIT match_count * 2 ) SELECT COALESCE(s.knowledge_id, f.knowledge_id) AS knowledge_id, COALESCE(s.title, f.title) AS title, COALESCE(s.entry_type, f.entry_type) AS entry_type, ( COALESCE(1.0 / (rrf_k + f.rank_ix), 0.0) * full_text_weight + COALESCE(1.0 / (rrf_k + s.rank_ix), 0.0) * semantic_weight ) AS score, CASE WHEN s.knowledge_id IS NOT NULL AND f.knowledge_id IS NOT NULL THEN 'both' WHEN f.knowledge_id IS NOT NULL THEN 'keyword' ELSE 'semantic' END AS match_type FROM semantic s FULL OUTER JOIN full_text f ON s.knowledge_id = f.knowledge_id ORDER BY score DESC LIMIT match_count; $$; ``` ### Unified Search Function For convenience, a single function that searches across all types: ```sql CREATE OR REPLACE FUNCTION search_all_hybrid( query_text TEXT, query_embedding vector(384), match_count INT DEFAULT 20, filter_item_type TEXT DEFAULT NULL -- 'directive', 'tool', 'knowledge', or NULL for all ) RETURNS TABLE ( item_id UUID, item_type TEXT, name TEXT, description TEXT, category TEXT, score FLOAT ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY WITH directive_results AS ( SELECT directive_id AS item_id, 'directive'::TEXT AS item_type, name, description, category, score FROM search_directives_hybrid(query_text, query_embedding, match_count / 3) WHERE filter_item_type IS NULL OR filter_item_type = 'directive' ), tool_results AS ( SELECT tool_id AS item_id, 'tool'::TEXT AS item_type, name, description, category, score FROM search_tools_hybrid(query_text, query_embedding, match_count / 3) WHERE filter_item_type IS NULL OR filter_item_type = 'tool' ), knowledge_results AS ( SELECT knowledge_id AS item_id, 'knowledge'::TEXT AS item_type, title AS name, NULL::TEXT AS description, entry_type AS category, score FROM search_knowledge_hybrid(query_text, query_embedding, match_count / 3) WHERE filter_item_type IS NULL OR filter_item_type = 'knowledge' ) SELECT * FROM directive_results UNION ALL SELECT * FROM tool_results UNION ALL SELECT * FROM knowledge_results ORDER BY score DESC LIMIT match_count; END; $$; ``` ### Index Tuning Parameters | Parameter | Value | Rationale | | ----------------- | ----------- | ------------------------------------ | | `m` (HNSW) | 16 | Good balance for < 1M vectors | | `ef_construction` | 64 | Higher = better recall, slower build | | `rrf_k` | 60 | Standard RRF smoothing constant | | FTS weight A | Name | Exact name matches ranked highest | | FTS weight B | Description | Good description matches next | | FTS weight C | Content | Full content matches last | For larger scale (> 1M items), consider: - Increase `m` to 32 - Increase `ef_construction` to 128 - Add partitioning by category ### RLS Policies ```sql -- Enable RLS ALTER TABLE directive_embeddings ENABLE ROW LEVEL SECURITY; ALTER TABLE tool_embeddings ENABLE ROW LEVEL SECURITY; ALTER TABLE knowledge_embeddings ENABLE ROW LEVEL SECURITY; -- Public read access (adjust based on your auth model) CREATE POLICY "Public read access" ON directive_embeddings FOR SELECT USING (true); CREATE POLICY "Public read access" ON tool_embeddings FOR SELECT USING (true); CREATE POLICY "Public read access" ON knowledge_embeddings FOR SELECT USING (true); -- Authenticated write access CREATE POLICY "Authenticated users can insert" ON directive_embeddings FOR INSERT WITH CHECK (auth.role() = 'authenticated'); CREATE POLICY "Authenticated users can insert" ON tool_embeddings FOR INSERT WITH CHECK (auth.role() = 'authenticated'); CREATE POLICY "Authenticated users can insert" ON knowledge_embeddings FOR INSERT WITH CHECK (auth.role() = 'authenticated'); -- Grant permissions GRANT SELECT ON directive_embeddings TO anon, authenticated; GRANT SELECT ON tool_embeddings TO anon, authenticated; GRANT SELECT ON knowledge_embeddings TO anon, authenticated; GRANT INSERT, UPDATE, DELETE ON directive_embeddings TO authenticated; GRANT INSERT, UPDATE, DELETE ON tool_embeddings TO authenticated; GRANT INSERT, UPDATE, DELETE ON knowledge_embeddings TO authenticated; -- Grant execute on search functions GRANT EXECUTE ON FUNCTION search_directives_hybrid TO anon, authenticated; GRANT EXECUTE ON FUNCTION search_tools_hybrid TO anon, authenticated; GRANT EXECUTE ON FUNCTION search_knowledge_hybrid TO anon, authenticated; GRANT EXECUTE ON FUNCTION search_all_hybrid TO anon, authenticated; ``` ### Migration from `item_embeddings` ```sql -- 1. Create new tables (above) -- 2. Migrate existing embeddings INSERT INTO directive_embeddings (directive_id, embedding, content, name, description, category, validated_at) SELECT d.id AS directive_id, ie.embedding, ie.content, d.name, d.description, d.category, ie.validated_at FROM item_embeddings ie JOIN directives d ON ie.item_id = d.id::text OR ie.item_id = d.name WHERE ie.item_type = 'directive'; INSERT INTO tool_embeddings (tool_id, embedding, content, name, description, tool_type, category, validated_at) SELECT t.id AS tool_id, ie.embedding, ie.content, t.name, t.description, t.tool_type, t.category, ie.validated_at FROM item_embeddings ie JOIN tools t ON ie.item_id = t.id::text OR ie.item_id = t.tool_id WHERE ie.item_type IN ('script', 'tool'); INSERT INTO knowledge_embeddings (knowledge_id, embedding, content, title, entry_type, validated_at) SELECT k.id AS knowledge_id, ie.embedding, ie.content, k.title, k.entry_type, ie.validated_at FROM item_embeddings ie JOIN knowledge_entries k ON ie.item_id = k.id::text OR ie.item_id = k.zettel_id WHERE ie.item_type = 'knowledge'; -- 3. Verify migration -- SELECT COUNT(*) FROM directive_embeddings; -- SELECT COUNT(*) FROM tool_embeddings; -- SELECT COUNT(*) FROM knowledge_embeddings; -- 4. Drop old table (after verification) -- DROP TABLE item_embeddings; ``` --- ## Summary | Aspect | Current | Proposed | | ---------------- | -------------------------- | ------------------------------------- | | Canonical table | `scripts` | `tools` | | Tool types | Python only | python, bash, api, docker, mcp | | File storage | Single `content` column | `tool_version_files` + Storage | | Manifest | None | `tool_versions.manifest` (JSONB) | | Multi-file | Not supported | Supported via `tool_version_files` | | Embeddings table | `item_embeddings` (single) | Per-type tables | | Vector index | IVFFlat | HNSW (better perf) | | Full-text search | None | Generated tsvector + GIN | | Search method | Vector-only | Hybrid RRF (semantic + keyword) | | Backward compat | N/A | Views for `scripts`/`script_versions` | This evolution enables the full tool system vision with best-practice RAG/vector search while maintaining backward compatibility.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolilley/kiwi-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

DATABASE_EVOLUTION_DESIGN.md•35.6 KiB