Claude Context Local

CLAUDE.md•7.01 kB

# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Claude Embedding Search is an intelligent code search system that uses Google's EmbeddingGemma model and AST-based chunking to provide semantic search capabilities for Python codebases, integrated with Claude Code via MCP (Model Context Protocol). ## Key Commands ### Development Setup ```bash # Install dependencies uv sync # Install in development mode uv sync --dev ``` ### Testing ```bash # Run all tests python tests/run_tests.py # Run specific test categories python tests/run_tests.py --unit # Unit tests only python tests/run_tests.py --integration # Integration tests only python tests/run_tests.py --chunking # Chunking tests only python tests/run_tests.py --embeddings # Embedding tests only python tests/run_tests.py --search # Search tests only python tests/run_tests.py --mcp # MCP server tests only # Run tests with coverage python tests/run_tests.py --coverage # Run tests with verbose output python tests/run_tests.py --verbose # Run specific test files or patterns python tests/run_tests.py unit/test_chunking.py # Alternative: Direct pytest usage python -m pytest # All tests python -m pytest -m "unit" # Unit tests only python -m pytest -m "not slow" # Skip slow tests python -m pytest tests/unit/test_chunking.py -v # Single test file ``` ### Indexing & Usage ```bash # Index a Python codebase ./scripts/index_codebase.py /path/to/project # Index with custom storage location ./scripts/index_codebase.py /path/to/project --storage-dir /custom/location # Clear existing index and reindex ./scripts/index_codebase.py /path/to/project --clear # Enable verbose logging ./scripts/index_codebase.py /path/to/project --verbose ``` ### MCP Server ```bash # Run MCP server directly uv run python mcp_server/server.py # Add to Claude Code (global) claude mcp add code-search --scope user -- uv run --directory /full/path/to/claude_embedding_search python mcp_server/server.py # Add to Claude Code (project-specific) claude mcp add code-search -- uv run --directory /full/path/to/claude_embedding_search python mcp_server/server.py ``` ## Architecture The codebase is organized into distinct modules with clear separation of concerns: ### Core Components - **`chunking/`**: AST-based code parsing and chunking - `python_ast_chunker.py`: Breaks Python code into semantically meaningful chunks (functions, classes, modules) - `multi_language_chunker.py`: Tree-sitter based chunking for JavaScript, TypeScript, Go, Java, Rust, and Svelte - Preserves context and relationships between code elements - **`embeddings/`**: Embedding generation using EmbeddingGemma - `embedder.py`: Handles model loading, caching, and batch embedding generation - Uses `google/embeddinggemma-300m` model with 768-dimensional embeddings - **`search/`**: FAISS-based search and indexing - `indexer.py`: Manages FAISS indices, metadata storage (SQLite), and index persistence - `searcher.py`: Intelligent search with filtering, context-aware results, and similarity search - **`mcp_server/`**: Claude Code integration via MCP - `server.py`: FastMCP server exposing search tools to Claude Code - Provides `search_code`, `index_directory`, `find_similar_code`, etc. - **`merkle/`**: Incremental indexing support - `merkle_dag.py`: Merkle tree implementation for efficient change detection - `change_detector.py`: Detects file additions, modifications, and deletions - `snapshot_manager.py`: Manages snapshots for incremental indexing - **`search/incremental_indexer.py`**: Orchestrates incremental indexing using Merkle tree change detection ### Storage Structure Data is stored in `~/.claude_code_search/` (configurable via `CODE_SEARCH_STORAGE`): ``` ~/.claude_code_search/ ├── models/ # Downloaded EmbeddingGemma models ├── projects/ # Project-specific data │ └── {project_name}_{hash}/ │ ├── project_info.json # Project metadata │ ├── index/ # FAISS indices and metadata │ │ ├── code.index # Vector index │ │ ├── metadata.db # Chunk metadata (SQLite) │ │ └── stats.json # Index statistics │ └── snapshots/ # Merkle tree snapshots for incremental indexing ``` ### Chunking Strategy The system uses AST parsing to create semantically meaningful chunks: - Complete functions with docstrings and decorators - Full classes with methods as separate chunks - Module-level code blocks and constants - Rich metadata: file paths, semantic tags, complexity scores, relationships ## Testing Strategy Tests are organized by component with pytest markers: - `unit`: Fast, isolated unit tests - `integration`: End-to-end workflow tests - `chunking`: AST chunking functionality - `embeddings`: Model loading and embedding generation - `search`: Indexing and search functionality - `mcp`: MCP server integration - `slow`: Time-intensive tests (excluded by default) ## Development Notes ### Key Dependencies - `sentence-transformers`: EmbeddingGemma model loading and inference - `faiss-cpu`: Efficient vector similarity search - `fastmcp`: MCP server implementation for Claude Code integration - `sqlitedict`: Persistent metadata storage - `tree-sitter` & `tree-sitter-languages`: Multi-language parsing support - `click`: Command-line interface utilities - `pytest`: Testing framework with async support ### Performance Considerations - Model size: ~300MB (EmbeddingGemma-300m) - Embedding dimension: 768 (FAISS Flat index for small datasets, IVF for large) - Batch processing: Configurable batch sizes for memory management - Local processing: All embeddings computed locally, no API calls - Incremental indexing: Only reprocesses changed files using Merkle tree snapshots ### Environment Variables - `CODE_SEARCH_STORAGE`: Custom storage directory (default: `~/.claude_code_search`) ## Common Tasks ### Adding New Chunk Types 1. Extend `python_ast_chunker.py` to handle new AST node types 2. Update metadata extraction in chunk creation 3. Add corresponding tests in `tests/unit/test_chunking.py` ### Modifying Search Behavior 1. Update `searcher.py` for new filtering/ranking logic 2. Modify MCP server tools in `server.py` if new parameters needed 3. Add integration tests in `tests/integration/test_full_flow.py` ### Testing Changes Always run the full test suite before commits: ```bash python tests/run_tests.py --coverage ``` For quick iteration during development: ```bash python tests/run_tests.py --unit --verbose -x ``` ### Multi-Language Support The system now supports chunking and indexing multiple languages: - Python (AST-based chunking) - JavaScript/TypeScript (tree-sitter) - JSX/TSX (React components) - Go, Java, Rust (tree-sitter) - Svelte components

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/FarhanAliRaza/claude-context-local'

If you have feedback or need assistance with the MCP directory API, please join our Discord server