MCP Codebase Insight

by tosin2013

Overview Schema Related Servers Score Discussions

mcp-codebase-insight
.github

copilot-instructions.md•7.49 kB

# MCP Codebase Insight - AI Agent Instructions ## Architecture Overview **MCP Server with Vector-Backed Knowledge Base**: FastAPI-based Model Context Protocol server providing codebase analysis through Qdrant vector store, semantic search with `sentence-transformers`, and pattern detection. ### Core Service Components (`src/mcp_codebase_insight/core/`) - **VectorStore** (`vector_store.py`): Qdrant client wrapper with retry logic, collection initialization - **EmbeddingProvider** (`embeddings.py`): Sentence transformers (`all-MiniLM-L6-v2` default), lazy initialization - **CacheManager** (`cache.py`): Dual-layer (memory + disk) caching for embeddings and API results - **KnowledgeBase** (`knowledge.py`): Semantic search over stored patterns with vector similarity - **ServerState** (`state.py`): Component lifecycle via DIContainer, async initialization/cleanup tracking - **ADRManager** (`adr.py`): Markdown frontmatter-based Architecture Decision Records in `docs/adrs/` ### Service Initialization Pattern All core services follow async init/cleanup: `await service.initialize()` → use service → `await service.cleanup()`. ServerState manages component lifecycle through DIContainer, tracking status with `ComponentStatus` enum. See `server_lifespan()` in `server.py` for orchestration example. ### Configuration & Environment - **ServerConfig** (`core/config.py`): Uses `@dataclass`, loads from env with `ServerConfig.from_env()` - **Key env vars**: `QDRANT_URL`, `MCP_EMBEDDING_MODEL`, `MCP_COLLECTION_NAME`, `MCP_CACHE_ENABLED`, `MCP_DISK_CACHE_DIR` - **Directory structure**: Config auto-creates `docs/`, `docs/adrs/`, `knowledge/`, `cache/` on init via `config.create_directories()` ## Development Workflows ### Running Tests **Custom test runner**: `./run_tests.py` (NOT plain pytest) - handles asyncio isolation, event loop cleanup ```bash # Run all tests with isolation and coverage ./run_tests.py --all --clean --isolated --coverage # Run specific test categories ./run_tests.py --component --isolated # Component tests ./run_tests.py --integration --isolated # Integration tests ./run_tests.py --api --isolated # API endpoint tests # Run specific test file or function ./run_tests.py --file tests/components/test_cache.py ./run_tests.py --test test_vector_store_initialization ``` **Why custom runner?**: Event loop conflicts between test modules. Runner provides `--isolated` (PYTHONPATH isolation), `--sequential` (no parallelism), `--fully-isolated` (separate processes per module). ### Makefile Commands ```bash make install # Install dependencies from requirements.txt make test # Runs ./run_tests.py with recommended flags make lint # flake8 + mypy + black --check + isort --check make format # black + isort code formatting make run # python -m mcp_codebase_insight make docker-build # Build container with Qdrant integration ``` ### Docker & Qdrant Setup - **Dockerfile**: Python 3.11-slim, Rust toolchain (pydantic build), multi-stage cache optimization - **Qdrant**: External vector DB (port 6333), not bundled. Start via `docker-compose` or local install - **Container mounts**: Mount `docs/`, `knowledge/`, `cache/`, `logs/` for persistence ## Code Conventions & Patterns ### Async/Await Discipline - **All I/O operations are async**: File system via `aiofiles`, Qdrant via async client, cache operations - **Test isolation**: `conftest.py` manages session-scoped event loops with `_event_loops` dict, mutex locks (`_loops_lock`, `_tests_lock`) - **Fixtures**: Use `@pytest_asyncio.fixture` for async fixtures, `@pytest.mark.asyncio` for async tests ### Error Handling & Logging - **Structured logging**: `from ..utils.logger import get_logger` → `logger = get_logger(__name__)` - **Component-level error tracking**: ServerState stores errors in ComponentState, retry counts tracked - **Graceful degradation**: VectorStore initialization can fail (Qdrant unavailable), server continues with reduced functionality ### Testing Patterns - **Test fixtures in conftest.py**: `event_loop`, `test_config`, `vector_store`, `cache_manager` (session/function scoped) - **Isolation via server_test_isolation.py**: `get_isolated_server_state()` provides per-test server instances - **Component tests**: Focus on single service unit (e.g., `test_vector_store.py` → VectorStore CRUD operations) - **Integration tests**: Multi-component workflows (e.g., `test_api_endpoints.py` → FastAPI routes with live services) ### Dependency Injection Pattern DIContainer (`core/di.py`) manages component initialization order: 1. ServerConfig from env 2. Embedding model (SentenceTransformer) 3. VectorStore (needs embedder + Qdrant client) 4. CacheManager, MetricsManager, HealthManager 5. KnowledgeBase (needs VectorStore) 6. TaskManager, ADRManager **Usage**: Create DIContainer, call `await container.initialize()`, access via `container.get_component("vector_store")` ### Type Hints & Dataclasses - **Strict typing**: All functions have type hints (params + return types), mypy enforced in lint - **@dataclass for config/models**: ServerConfig, ComponentState, ADR, SearchResult use dataclasses - **Optional vs None**: Use `Optional[Type]` for potentially None values, explicit None checks ## Key File Relationships - **server.py** → imports core services, defines `server_lifespan` context manager - **core/state.py** → imports DIContainer, manages component registry - **core/di.py** → imports all service classes, orchestrates initialization - **tests/conftest.py** → imports ServerState, server_test_isolation for fixture setup - **run_tests.py** → spawns pytest subprocess with custom args, handles event loop cleanup ## Project-Specific Quirks 1. **Qdrant client version sensitivity**: Comments in `vector_store.py` note parameter name changes (`query_vector` → `query` in v1.13.3+). Code supports both for compatibility. 2. **Cache directory creation**: `disk_cache_dir` defaults to `"cache"` if `MCP_CACHE_ENABLED=true` but path not specified. Set to `None` if cache disabled (see `ServerConfig.__post_init__`). 3. **ADR numbering**: ADRManager auto-increments `next_adr_number` by scanning `docs/adrs/` for `NNN-*.md` patterns on init. 4. **Test runner event loop management**: `conftest.py` maintains process-specific event loop dict to avoid "different loop" errors across test modules. 5. **Component status tracking**: Don't assume component is ready after creation. Check `component.status == ComponentStatus.INITIALIZED` before use. ## Common Debugging Patterns - **Qdrant connection issues**: Check `QDRANT_URL` env var, verify Qdrant is running (`curl http://localhost:6333/collections`) - **Event loop errors in tests**: Use `--isolated` and `--sequential` flags with `run_tests.py`, check `conftest.py` fixtures are async - **Missing embeddings**: EmbeddingProvider lazy-loads model on first use, check `initialized` flag - **Cache not persisting**: Verify `MCP_DISK_CACHE_DIR` is writable, check `cache_enabled` in config ## References - **System architecture diagrams**: `system-architecture.md` (Mermaid diagrams for components, data flow) - **Detailed setup guides**: `docs/getting-started/` for installation, Qdrant setup, Docker - **Testing philosophy**: Follows TDD, see `docs/tdd/workflow.md` and Agans' 9 Rules in `docs/debuggers/` - **Existing AI context**: `CLAUDE.md` has legacy build/test commands (superseded by Makefile + run_tests.py)

Latest Blog Posts

OpenTelemetry for Model Context Protocol (MCP) Analytics and Agent Observability
By Om-Shree-0709 on .
observability
mcp
opentelemetry
Securing Enterprise AI Agents with Unique Identities in the Model Context Protocol (MCP)
By Om-Shree-0709 on .
When Your Year of Work Gets Copied Overnight: What Actually Matters?
By punkpeye on .
startups

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tosin2013/mcp-codebase-insight'

If you have feedback or need assistance with the MCP directory API, please join our Discord server