Skip to main content
Glama

MCP Indexer

by gkatechis
CONFIGURATION.md5.93 kB
# Configuration Guide ## Environment Variables ### MCP_INDEXER_DB_PATH Controls where mcpIndexer stores its database and embeddings. **Default**: `~/.mcpindexer/db` **Usage**: ```bash export MCP_INDEXER_DB_PATH=/custom/path/to/db ``` **What it affects**: - ChromaDB database location - Embedding storage - Persistent index data **Example configurations**: ```bash # Use default location (recommended) export MCP_INDEXER_DB_PATH=~/.mcpindexer/db # Use project-specific location export MCP_INDEXER_DB_PATH=./project_indexes # Use shared team location export MCP_INDEXER_DB_PATH=/shared/team/code_indexes ``` ### PYTHONPATH Required for running mcpIndexer from source. **Usage**: ```bash export PYTHONPATH=/absolute/path/to/mcpIndexer/src ``` **Why it's needed**: - Allows Python to find the `mcpindexer` module - Required for running scripts and the MCP server - Must point to the `src/` directory ## MCP Configuration ### .mcp.json Used by MCP clients (like Claude Code) to connect to the mcpIndexer server. **Location**: Project root or Claude Code configuration directory **Example**: ```json { "mcpServers": { "mcpindexer": { "command": "python3", "args": [ "/absolute/path/to/mcpIndexer/src/mcpindexer/server.py" ], "env": { "PYTHONPATH": "/absolute/path/to/mcpIndexer/src", "MCP_INDEXER_DB_PATH": "~/.mcpindexer/db" } } } } ``` **Configuration fields**: - `command`: Python interpreter to use - `args`: Path to the MCP server script - `env.PYTHONPATH`: Path to mcpIndexer src directory - `env.MCP_INDEXER_DB_PATH`: Database location ## Stack Configuration ### ~/.mcpindexer/stack.json Automatically created and maintained by mcpIndexer. Tracks indexed repositories. **Example**: ```json { "version": "1.0", "repos": { "my-repo": { "name": "my-repo", "path": "/absolute/path/to/repo", "status": "indexed", "last_indexed": "2025-10-17T12:34:56.789Z", "last_commit": "abc123def456...", "files_indexed": 162, "chunks_indexed": 302, "auto_reindex": true } } } ``` **Fields**: - `name`: Repository identifier - `path`: Absolute path to repository - `status`: `indexed`, `indexing`, `error`, or `pending` - `last_indexed`: ISO timestamp of last indexing - `last_commit`: Git commit hash when last indexed - `files_indexed`: Number of files processed - `chunks_indexed`: Number of code chunks created - `auto_reindex`: Whether to auto-reindex on git pull **Manual editing**: Generally not recommended, but safe if you follow the schema. ## Dependency Storage ### ~/.mcpindexer/dependencies.json Stores cross-repository dependency information. **Example**: ```json { "version": "1.0", "repos": { "my-repo": { "internal_count": 45, "external_packages": ["express", "react"], "cross_repo_deps": [ { "source_repo": "my-repo", "target_repo": "shared-lib", "package": "@myorg/shared-lib" } ] } } } ``` **Automatically maintained**: Updated each time a repository is indexed. ## Organization-Specific Configuration ### Filtering Packages by Organization You can configure mcpIndexer to only track packages from your organization: ```python from mcpindexer.dependency_storage import DependencyStorage # Configure organization prefixes storage = DependencyStorage( org_prefixes=['@myorg/', 'myorg-', 'myorg_'] ) # Now only packages matching these prefixes will be tracked ``` This is useful for: - Focusing on internal dependencies - Reducing noise from external packages - Tracking monorepo dependencies **Default behavior**: If no prefixes are configured, all packages are tracked. ## Advanced Configuration ### Custom Database Location Per Repository ```python from mcpindexer.embeddings import EmbeddingStore # Different databases for different projects frontend_store = EmbeddingStore( db_path="~/.mcpindexer/frontend_db", collection_name="frontend_index" ) backend_store = EmbeddingStore( db_path="~/.mcpindexer/backend_db", collection_name="backend_index" ) ``` ### Batch Size Tuning For large repositories, adjust batch size: ```python result = indexer.index(batch_size=2000) # Default is 1000 ``` **Higher values**: Better performance, more memory usage **Lower values**: More incremental progress, lower memory ### File Filtering Exclude specific files or directories: ```python def my_filter(file_path): # Skip test files if 'test' in str(file_path): return False # Skip generated files if 'generated' in str(file_path): return False return True result = indexer.index(file_filter=my_filter) ``` ## Troubleshooting ### Database Location Issues **Problem**: Can't find database or indices **Solutions**: 1. Check `MCP_INDEXER_DB_PATH` is set correctly 2. Ensure path has write permissions 3. Use absolute paths, not relative 4. Expand `~` to full home directory path ### PYTHONPATH Issues **Problem**: `ModuleNotFoundError: No module named 'mcpindexer'` **Solutions**: 1. Verify PYTHONPATH includes the `src/` directory 2. Use absolute paths 3. Check spelling and capitalization 4. Restart your terminal/IDE after setting ### Permission Errors **Problem**: Can't write to database directory **Solutions**: ```bash # Check permissions ls -la ~/.mcpindexer/ # Fix permissions chmod 755 ~/.mcpindexer/ chmod 644 ~/.mcpindexer/*.json ``` ## Best Practices 1. **Use the default database location** unless you have a specific reason not to 2. **Set environment variables in your shell profile** (`.bashrc`, `.zshrc`) for persistence 3. **Use absolute paths** in all configuration files 4. **Back up `~/.mcpindexer/`** directory if you have large indices 5. **One database per development environment** (work, personal, etc.) 6. **Configure organization prefixes** to reduce noise from external dependencies

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gkatechis/mcpIndexer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server