How do I use Embeddings Searcher?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Embeddings Searcher search for authentication patterns in the API documentation" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Embeddings Searcher for Claude Code Documentation

A focused embeddings-based search system for navigating markdown documentation in code repositories.

Features

Semantic Search: Uses sentence transformers to find relevant documentation based on meaning, not just keywords
Markdown-Focused: Optimized for markdown documentation with intelligent chunking
Repository-Aware: Organizes and searches across multiple repositories
MCP Integration: Provides an MCP server for integration with Cursor/Claude
UV Package Management: Uses UV for fast dependency management

Quick Start for Claude Code

1. Clone and setup

git clone <this-repo>
cd kb
uv sync

2. Add your documentation

Place your documentation repositories in the repos/ directory.

3. Index your documentation

uv run python embeddings_searcher.py --index

4. (Optional) Convert to ONNX for faster inference

uv run python onnx_convert.py --convert --test

5. Add MCP server to Claude Code

claude mcp add -- documentation-searcher uv run --directory /absolute/path/to/kb python mcp_server.py

Replace /absolute/path/to/kb with the actual path to your project directory.

6. Use in Claude Code

Ask Claude Code questions like:

"Search for authentication patterns"
"Find API documentation"
"Look up configuration options"

The MCP server will automatically search through your indexed documentation and return relevant results.

Quick Start

1. Index Documentation

First, index all markdown documentation in your repositories:

uv run python embeddings_searcher.py --index

This will:

Find all .md, .markdown, and .txt files in the repos/ directory
Chunk them intelligently based on markdown structure
Generate embeddings using sentence transformers
Store everything in a SQLite database

2. Search Documentation

# Basic search
uv run python embeddings_searcher.py --query "API documentation"

# Search within a specific repository
uv run python embeddings_searcher.py --query "authentication" --repo "my-project.git"

# Limit results and set similarity threshold
uv run python embeddings_searcher.py --query "configuration" --max-results 5 --min-similarity 0.2

3. Get Statistics

# Show indexing statistics
uv run python embeddings_searcher.py --stats

# List indexed repositories
uv run python embeddings_searcher.py --list-repos

MCP Server Integration

The project includes an MCP server for integration with Cursor/Claude:

# Start the MCP server
uv run python mcp_server.py

MCP Tools Available

search_docs: Search through documentation using semantic similarity
list_repos: List all indexed repositories
get_stats: Get indexing statistics
get_document: Retrieve full document content by path

Project Structure

kb/
├── embeddings_searcher.py    # Main searcher implementation
├── mcp_server.py            # MCP server for Claude Code integration
├── onnx_convert.py          # ONNX model conversion utility
├── pyproject.toml           # UV project configuration
├── embeddings_docs.db       # SQLite database with embeddings
├── sentence_model.onnx      # ONNX model (generated)
├── model_config.json        # Model configuration (generated)
├── tokenizer/               # Tokenizer files (generated)
├── repos/                   # Your documentation repositories
│   ├── project1.git/
│   ├── project2.git/
│   └── documentation.git/
└── README.md               # This file

How It Works

Intelligent Chunking

The system chunks markdown documents based on:

Header structure (H1, H2, H3, etc.)
Content length (500 words per chunk)
Semantic boundaries

Embedding Generation

Uses all-MiniLM-L6-v2 sentence transformer model by default
Supports ONNX models for faster inference
Caches embeddings for efficient updates

Search Algorithm

Generates embedding for your query
Compares against all document chunks using cosine similarity
Returns ranked results with context and metadata
Supports repository-specific searches

CLI Options

embeddings_searcher.py

# Indexing
--index                    # Index all repositories
--force                    # Force reindex of all documents

# Search
--query "search terms"     # Search query
--repo "repo-name"        # Search within specific repository
--max-results 10          # Maximum results to return
--min-similarity 0.1      # Minimum similarity threshold

# Information
--stats                   # Show indexing statistics
--list-repos             # List indexed repositories

# Configuration
--kb-path /path/to/kb    # Path to knowledge base
--db-path embeddings.db  # Path to embeddings database
--model model-name       # Sentence transformer model
--ignore-dirs [DIRS...]  # Directories to ignore during indexing

mcp_server.py

--kb-path /path/to/kb         # Path to knowledge base
--docs-db-path embeddings.db  # Path to docs embeddings database
--model model-name            # Sentence transformer model

ONNX Model Conversion

For faster inference, you can convert the sentence transformer model to ONNX format:

# Convert model to ONNX
uv run python onnx_convert.py --convert

# Test ONNX model
uv run python onnx_convert.py --test

# Convert and test in one command
uv run python onnx_convert.py --convert --test

Example Usage

# Index documentation
uv run python embeddings_searcher.py --index

# Search for API documentation
uv run python embeddings_searcher.py --query "API endpoints"

# Search for authentication in specific repository
uv run python embeddings_searcher.py --query "user authentication" --repo "my-project.git"

# Get detailed statistics
uv run python embeddings_searcher.py --stats

Performance

Indexing: ~1400 documents in ~1 minute
Search: Sub-second response times
Storage: ~50MB for embeddings database with 6500+ chunks
Memory: ~500MB during indexing, ~200MB during search

Troubleshooting

Unicode Errors

Some files may have encoding issues. The system automatically falls back to latin-1 encoding for problematic files.

Large Files

Files larger than 1MB are automatically skipped to prevent memory issues.

Model Loading

If sentence-transformers is not available, the system will attempt to use ONNX models or fall back to dummy embeddings for testing.

Embeddings Searcher