PocketMCP

PocketMCP
docs
prompts

mcp-inst.md•6.36 KiB

## **PocketMCP** (Local MCP Search Server) ### 1) Goal Create a **lightweight, local-first MCP server** that: * **Watches a folder** (default `./kb`) and auto-ingests files. * **Chunks & embeds** content locally with MiniLM (Transformers.js). * Stores vectors in **SQLite + sqlite-vec**. * Exposes **semantic search** via **MCP tools** (+ resource URIs). * Works with **VS Code** and **Cursor** MCP clients. * Designed for **Intel N100 / 16GB RAM** class machines. ### 2) Tech/Runtime Constraints * **Language:** TypeScript (Node 20+). * **Embeddings:** `Xenova/all-MiniLM-L6-v2` (384-dim), mean-pooled, L2-normalized. * **Vector store:** **SQLite + sqlite-vec** (`vec0` virtual table). * **DB access:** `better-sqlite3` (WAL enabled). * **FS watcher:** `chokidar` with debounce + limited concurrency. * **MCP:** `@modelcontextprotocol/sdk` over stdio (no network port). * **No cloud calls**; runs fully local after first model download/cache. ### 3) Project Layout (files to create; describe responsibilities only) * `src/server.ts` – MCP wiring: register tools/resources, read env, start watcher (if enabled). * `src/db.ts` – DB connection, load `sqlite-vec`, create schema, indexes. * `src/embeddings.ts` – Transformers.js MiniLM loader + `embed(texts[]) -> Float32Array[]`. * `src/chunker.ts` – Sentence-aware chunker (\~1000 chars, 120 overlap). * `src/ingest.ts` – Generic “raw text → chunks → vectors → DB upsert” helpers. * `src/file-ingest.ts` – File specific ingest: path→text, hash check, upsert, delete. * `src/watcher.ts` – Start chokidar; debounce; queue add/change/unlink to ingest/delete. * Top-level config/metadata files: `package.json`, `tsconfig.json`, `.env.example`, `.gitignore`, `README.md`. ### 4) Environment Variables (with defaults) * `SQLITE_PATH=./data/index.db` * `WATCH_DIR=./kb` (if set, enable watcher) * `MODEL_ID=Xenova/all-MiniLM-L6-v2` * `CHUNK_SIZE=1000` * `CHUNK_OVERLAP=120` ### 5) Data Model (SQLite) * **documents** * `doc_id TEXT PK` * `external_id TEXT UNIQUE` (absolute file path or caller-supplied id) * `source TEXT` (`file|url|raw`) * `uri TEXT` (e.g., `file://...`) * `title TEXT` * `content_sha256 TEXT` * `created_at TEXT`, `updated_at TEXT` * **vec\_chunks** (sqlite-vec `vec0` virtual table) * Metadata cols: `chunk_id TEXT PK`, `doc_id TEXT`, `idx INTEGER`, `start_off INTEGER`, `end_off INTEGER` * Vector col: `embedding FLOAT[384]` (must match model dim) * Aux col: `+text TEXT` (full chunk body; selectable) ### 6) Chunking Rules * Target chunk length ≈ **1000 characters**, **120 char overlap**. * Split by sentence boundaries first; fall back to sliding window. * Record offsets for provenance (`start_off`, `end_off`). ### 7) Embedding Rules * Use **Transformers.js** feature-extraction pipeline with `MODEL_ID`. * **Pooling:** mean; **normalize:** true (unit vector). * Return `Float32Array` per chunk. * Batch embedding within a file; keep memory modest. ### 8) Search Flow (KNN) 1. Embed query → `qVec[384]`. 2. KNN against `vec_chunks.embedding` using sqlite-vec `MATCH` with `k = top_k`. 3. Return **sorted** results (ascending distance), include: * `chunk_id`, `doc_id`, `idx`, `score` (1 - distance), `preview` (first \~240 chars), * `resource` = `mcp+doc://<doc_id>#<chunk_id>`. ### 9) Ingestion & Watcher Behavior * On **add/change**: * Read file → SHA-256 → if unchanged vs `documents.content_sha256`: **skip**. * Else: load text → chunk → embed → **transaction**: * Replace all `vec_chunks` rows for that `doc_id` (simple overwrite semantics). * Upsert `documents` row with new `content_sha256`. * On **unlink**: * Delete `vec_chunks` for `doc_id`; delete `documents` row. * **Debounce** writes (e.g., 600ms) and cap concurrency (e.g., 3). * Default glob patterns: `**/*.md`, `**/*.txt` (PDF loaders are out of scope for v0). ### 10) MCP Surface (Tools & Resource) * **Tool: `search`** * **Input:** `{ query: string, top_k?: number (default 8), filter?: { doc_ids?: string[] } }` * **Output:** `{ matches: [{ chunk_id, doc_id, idx, score, preview, resource }] }` * **Tool: `upsert_documents`** * **Input:** `{ docs: [{ text: string, external_id?: string, title?: string, metadata?: object }] }` * Upserts each as a new or existing document (by `external_id`); replaces its chunks. * **Output:** `{ results: [{ doc_id, chunks, status: "inserted"|"updated" }] }` * **Tool: `delete_documents`** * **Input:** `{ doc_ids?: string[], external_ids?: string[] }` * **Output:** `{ deleted_doc_ids: string[], deleted_chunks: number }` * **Tool: `list_documents`** * **Input:** `{ page?: { limit?: number, cursor?: string } }` (cursor optional for future) * **Output:** `{ documents: [{ doc_id, external_id, title, source, updated_at }], next_cursor?: string }` * **Resource:** `mcp+doc://<doc_id>#<chunk_id>` * **Returns:** `{ doc_id, chunk_id, text, start_off, end_off, title? }` ### 11) Non-Functional Requirements * **Local-only:** after first model download/cache, no network calls. * **Latency:** search p95 < **100 ms** for `top_k<=10` and ≤ **50k** chunks. * **Durability:** WAL enabled; all ingest writes in transactions. * **Safety:** path normalization; ignore temp files (`~$*`, `*.tmp`, etc.). * **Logs:** JSON logs per tool invocation with timings and counts. ### 12) Acceptance Tests (manual) 1. **Bootstrap** * Run in dev mode; server prints MCP ready. No watcher errors. 2. **Initial ingest** * Place `a.md`, `b.md` in `WATCH_DIR`; observe “inserted” logs. 3. **Search** * Invoke `search` with a term present in `a.md` → top hit previews `a.md` content. 4. **Update** * Edit `a.md` (meaningful change) → “updated” log; search reflects new text. * Save `a.md` without changes → “skipped”. 5. **Delete** * Remove `b.md` → “deleted” log; search no longer returns its chunks. 6. **Direct upsert** * Call `upsert_documents` with raw text → new `doc_id`; searchable. 7. **Resource fetch** * Use returned `resource` URI; body matches stored chunk text. ### 13) README Expectations Include: * One-paragraph description. * Architecture diagram (mermaid ok). * Quickstart (install, env, `pnpm dev`). * MCP client config examples for **Cursor** and **VS Code**. * Notes about `WATCH_DIR` (choose your own; `kb` is just a convention). ### 14) Out of Scope (v0) * PDF/Docx loaders, auth/multi-tenant, reranking, advanced filters, HTTP server.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Kailash-Sankar/PocketMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

mcp-inst.md•6.36 KiB