ickyMCP

CLAUDE.md•3.08 KiB

# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview ickyMCP is a RAG (Retrieval-Augmented Generation) MCP server for semantic document search. It indexes documents into a SQLite vector database and provides semantic search via the MCP protocol. ## Common Commands ```bash # Install dependencies pip install -r requirements.txt # Or as editable package pip install -e . # Run the MCP server python run.py # Fast parallel indexing (for bulk document ingestion) python fast_index.py <target_dir> --workers 8 --batch 64 --patterns "*.pdf" "*.docx" # Run tests pytest pytest tests/test_specific.py -k "test_name" ``` ## Architecture ### Data Flow 1. **Parsing** (`src/parsers.py`): Documents (PDF, DOCX, PPTX, XLSX, TXT, MD) are parsed to extract text 2. **Chunking** (`src/chunker.py`): Text is split into 4K token chunks with 500 token overlap using tiktoken 3. **Embedding** (`src/embedder.py`): Chunks are embedded using either local model (nomic-embed-text-v1.5) or Voyage AI API 4. **Storage** (`src/database.py`): Embeddings stored in SQLite with sqlite-vec extension for vector similarity search 5. **Search**: Query is embedded and compared using cosine similarity ### Key Components - **`src/server.py`**: MCP server entry point. Exposes 7 tools: `index`, `search`, `similar`, `refresh`, `list`, `delete`, `status` - **`src/embedder.py`**: Dual embedding support - `LocalEmbedder` (sentence-transformers) and `VoyageEmbedder` (API). Selected via `EMBEDDING_PROVIDER` env var - **`src/database.py`**: `VectorDatabase` class wraps SQLite + sqlite-vec. Three tables: `documents`, `chunks`, `chunk_embeddings` (virtual table) - **`src/chunker.py`**: `TextChunker` uses tiktoken for accurate token counting. Finds smart break points (paragraph > sentence > word) - **`fast_index.py`**: Standalone CLI for bulk indexing with thread-pool parsing and batched embeddings ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `ICKY_EMBEDDING_PROVIDER` | `voyage` | `local` or `voyage` | | `ICKY_CHUNK_SIZE` | `4000` | Tokens per chunk | | `ICKY_CHUNK_OVERLAP` | `500` | Overlap between chunks | | `ICKY_DB_PATH` | `./icky_voyage.db` or `./icky.db` | SQLite database path | | `ICKY_VOYAGE_API_KEY` | (set in config) | Voyage AI API key | | `ICKY_VOYAGE_MODEL` | `voyage-3.5-lite` | Voyage model name | | `ICKY_VOYAGE_DIMENSIONS` | `1024` | Voyage embedding dimensions | | `ICKY_LOCAL_MODEL` | `nomic-ai/nomic-embed-text-v1.5` | Local model name | ### Database Schema - `documents`: path (unique), file_type, file_size, modified_time, indexed_at, chunk_count, page_count - `chunks`: document_id (FK), chunk_index, chunk_text, token_count, page_number, start_char, end_char - `chunk_embeddings`: Virtual table using sqlite-vec for vector similarity (768 or 1024 dimensions) ### Embedding Prefixes (Local Model Only) The nomic model requires prefixes: - Query: `search_query: <text>` - Document: `search_document: <text>` These are applied automatically in `LocalEmbedder.embed_query()` and `embed_documents()`.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dl1683/ickyMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•3.08 KiB