Skip to main content
Glama
CLAUDE.md3.15 kB
# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview ickyMCP is a RAG (Retrieval-Augmented Generation) MCP server for semantic document search. It indexes documents into a SQLite vector database and provides semantic search via the MCP protocol. ## Common Commands ```bash # Install dependencies pip install -r requirements.txt # Or as editable package pip install -e . # Run the MCP server python run.py # Fast parallel indexing (for bulk document ingestion) python fast_index.py <target_dir> --workers 8 --batch 64 --patterns "*.pdf" "*.docx" # Run tests pytest pytest tests/test_specific.py -k "test_name" ``` ## Architecture ### Data Flow 1. **Parsing** (`src/parsers.py`): Documents (PDF, DOCX, PPTX, XLSX, TXT, MD) are parsed to extract text 2. **Chunking** (`src/chunker.py`): Text is split into 4K token chunks with 500 token overlap using tiktoken 3. **Embedding** (`src/embedder.py`): Chunks are embedded using either local model (nomic-embed-text-v1.5) or Voyage AI API 4. **Storage** (`src/database.py`): Embeddings stored in SQLite with sqlite-vec extension for vector similarity search 5. **Search**: Query is embedded and compared using cosine similarity ### Key Components - **`src/server.py`**: MCP server entry point. Exposes 7 tools: `index`, `search`, `similar`, `refresh`, `list`, `delete`, `status` - **`src/embedder.py`**: Dual embedding support - `LocalEmbedder` (sentence-transformers) and `VoyageEmbedder` (API). Selected via `EMBEDDING_PROVIDER` env var - **`src/database.py`**: `VectorDatabase` class wraps SQLite + sqlite-vec. Three tables: `documents`, `chunks`, `chunk_embeddings` (virtual table) - **`src/chunker.py`**: `TextChunker` uses tiktoken for accurate token counting. Finds smart break points (paragraph > sentence > word) - **`fast_index.py`**: Standalone CLI for bulk indexing with thread-pool parsing and batched embeddings ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `ICKY_EMBEDDING_PROVIDER` | `voyage` | `local` or `voyage` | | `ICKY_CHUNK_SIZE` | `4000` | Tokens per chunk | | `ICKY_CHUNK_OVERLAP` | `500` | Overlap between chunks | | `ICKY_DB_PATH` | `./icky_voyage.db` or `./icky.db` | SQLite database path | | `ICKY_VOYAGE_API_KEY` | (set in config) | Voyage AI API key | | `ICKY_VOYAGE_MODEL` | `voyage-3.5-lite` | Voyage model name | | `ICKY_VOYAGE_DIMENSIONS` | `1024` | Voyage embedding dimensions | | `ICKY_LOCAL_MODEL` | `nomic-ai/nomic-embed-text-v1.5` | Local model name | ### Database Schema - `documents`: path (unique), file_type, file_size, modified_time, indexed_at, chunk_count, page_count - `chunks`: document_id (FK), chunk_index, chunk_text, token_count, page_number, start_char, end_char - `chunk_embeddings`: Virtual table using sqlite-vec for vector similarity (768 or 1024 dimensions) ### Embedding Prefixes (Local Model Only) The nomic model requires prefixes: - Query: `search_query: <text>` - Document: `search_document: <text>` These are applied automatically in `LocalEmbedder.embed_query()` and `embed_documents()`.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dl1683/ickyMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server