Codebase Context

Overview Schema Related Servers Score Discussions

README.md•4.36 KiB

# Evaluation Fixtures This directory contains frozen evaluation sets for testing code search quality. ## Files - `eval-angular-spotify.json` - 20 semantic queries against [angular-spotify](https://github.com/trungk18/angular-spotify) (public, reproducible) - `eval-controlled.json` - 20 frozen queries for the in-repo controlled fixture codebase ## Running Evaluations ### Prerequisites 1. Clone the test codebase: ```bash git clone https://github.com/trungk18/angular-spotify /path/to/angular-spotify ``` 2. Build this project: ```bash npm install npm run build ``` ### Run Evaluation ```bash node scripts/run-eval.mjs /path/to/angular-spotify --fixture tests/fixtures/eval-angular-spotify.json # Controlled fixture example (no network) node scripts/run-eval.mjs tests/fixtures/codebases/eval-controlled --fixture tests/fixtures/eval-controlled.json ``` ### Output Format The eval script outputs: - **Top-1 Accuracy**: % of queries where the best result matches expected patterns - **Top-3 Recall**: % of queries where top-3 results include a match - **Spec Contamination**: % of queries returning test files - **Per-category breakdown**: Accuracy by query type (exact-name, conceptual, multi-concept, structural) - **Failure analysis**: Which queries failed and why ## Evaluation Integrity Rules ⚠️ **CRITICAL**: These eval fixtures are FROZEN. Once committed: 1. **DO NOT** adjust expected results to match system output 2. **DO NOT** add queries during development to "improve" scores 3. **DO NOT** remove "hard" queries that the system fails 4. **DO NOT** tune the system on this eval set then report scores ### Proper Usage ✅ **CORRECT**: - Commit frozen eval BEFORE making changes - Use eval to measure improvement honestly - Report failures transparently - Create NEW eval sets for iteration ❌ **INCORRECT**: - Adjusting fixture during development ("fixture fixes") - Cherry-picking queries that work well - Overfitting to this specific codebase - Reporting scores without disclosing methodology ## Query Design Principles ### Semantic Queries (NOT keyword matching) Queries are designed to test **semantic understanding**, not keyword matching: - ✅ "skip to next song" → should find `player-api.ts` (no "skip" keyword in file) - ✅ "persist data across browser sessions" → should find `local-storage.service.ts` - ✅ "add authorization token to API requests" → should find `auth.interceptor.ts` - ❌ "PlayerApiService" → keyword match (too easy) - ❌ "player api" → keyword match (too easy) ### Expected Patterns (NOT specific paths) Expected results use **patterns** that work across codebases: ```json { "expectedPatterns": ["player", "api"], "expectedNotPatterns": [".spec.", ".test."] } ``` This matches: - `libs/web/shared/data-access/spotify-api/src/lib/player-api.ts` ✅ - `apps/music/src/services/player-api.service.ts` ✅ - `player-api.spec.ts` ❌ (excluded by expectedNotPatterns) ### Query Categories 1. **conceptual** (7 queries): Natural language descriptions requiring semantic understanding 2. **multi-concept** (7 queries): Combining multiple concepts (hardest) 3. **exact-name** (3 queries): Class/service names (baseline) 4. **structural** (3 queries): Framework-specific patterns (NgRx, interceptors) ## Ground Truth Verification Ground truth established via manual code review: 1. Read the actual code to understand what it does 2. Verify the expected file implements the described functionality 3. Check for similar files that should also match 4. Document reasoning in query notes Example: - Query: "skip to next song" - Expected: `player-api.ts` - Reasoning: File contains `next()` method that calls `/me/player/next` API endpoint ## Reproducing Results To reproduce published results: 1. Clone the exact codebase version: ```bash git clone https://github.com/trungk18/angular-spotify cd angular-spotify git checkout <commit-hash-from-published-results> ``` 2. Use the frozen eval fixture (committed before measurements) 3. Run eval on both baseline and new version 4. Compare metrics transparently ## Adding New Eval Sets When creating new eval sets: 1. Design queries BEFORE any implementation 2. Establish ground truth via manual review 3. Test on multiple codebases (not just one) 4. Include "hard" queries expected to fail 5. Commit and tag BEFORE running any measurements 6. Document methodology in query notes See this README for full guidelines.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PatrickSys/codebase-context'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•4.36 KiB