Code-Index-MCP

INDEXING_STATUS.md•2.16 KiB

# Repository Indexing Status ## Summary We've successfully cleaned up duplicate scripts and created comprehensive indexing capabilities. However, indexing entire repositories with semantic embeddings is extremely time and resource intensive. ## Current Status ### Successfully Indexed (Full Repository) - **phoenix (C)**: 30 files → 302 embeddings (21s) - **redis (C)**: 766 files → 6,111 embeddings (415s / 7 minutes) ### Partial/In Progress - **grpc (C++)**: 6,189 files → 28,798 chunks (timed out during embedding creation) ### Challenges 1. **Scale**: Some repositories are massive (grpc has 6,189 code files) 2. **Time**: Full indexing of all repos would take many hours 3. **API Costs**: Voyage AI charges per token - full indexing would be expensive 4. **Storage**: Full embeddings for all repos would require significant storage ## Scripts Available ### Primary Indexing Script - `scripts/index_repositories.py` - Unified entry point with modes: - `--mode full`: SQL + Semantic indexing using MCP - `--mode sql`: BM25/FTS indexing only (fast, free) - `--mode semantic`: Semantic embeddings only ### Specialized Scripts - `scripts/index_all_repos_with_mcp.py` - Uses full MCP stack - `scripts/index_all_repos_semantic_full.py` - Creates embeddings for ALL files - `scripts/index_all_repos_semantic_simple.py` - Limited to 2000 lines per file - `scripts/index_test_repos_semantic_only.py` - Limited to 50 files per repo ## Recommendations 1. **For Testing**: Use the limited scripts (50-100 files per repo) 2. **For Production**: - Use SQL-only indexing for full repositories - Add semantic indexing selectively for important files 3. **For Cost Control**: - Limit embedding creation to key files - Use chunking strategies to reduce token usage ## Code Cleanup Completed ### Dispatcher - Migrated from `Dispatcher` to `EnhancedDispatcher` - Updated all imports and test files - Archived old implementations ### Scripts - Consolidated 40+ duplicate scripts - Created unified entry points - Archived old versions in `/archive/` ### Benefits - Cleaner codebase structure - No more confusion about which script to use - Consistent implementation across all components

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ViperJuice/Code-Index-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

INDEXING_STATUS.md•2.16 KiB