Tea Rags MCP

git-enrichments.md•8.77 KiB

--- title: Git Enrichments sidebar_position: 5 --- import AiQuery from '@site/src/components/AiQuery'; # Git Enrichments tea-rags enriches every indexed code chunk with **19 git-derived quality signals** — churn, stability, authorship, bug-fix rates, code age — at **function-level granularity**. These signals power filtering and reranking, so your AI agent finds not just relevant code, but code that is stable, well-owned, and battle-tested. :::tip Git enrichment runs concurrently with embedding and does not increase indexing time. ::: ## Enabling Git Enrichment Set the environment variable when configuring your MCP server: ```bash claude mcp add tea-rags -s user -- node /path/to/tea-rags-mcp/build/index.js \ -e CODE_ENABLE_GIT_METADATA=true ``` ## What You Get tea-rags computes metrics at **two levels**: 1. **File-level** — shared by all chunks of a file (commitCount, relativeChurn, bugFixRate, authors, etc.) 2. **Chunk-level** — per-function granularity (chunkCommitCount, chunkChurnRatio, chunkBugFixRate, etc.) For detailed metric definitions, formulas, and research context, see [Code Churn: Theory & Research](/knowledge-base/code-churn-research). ### Metrics at a Glance | Metric | Level | What it tells you | |--------|-------|-------------------| | `commitCount` | File | How often this file changes | | `relativeChurn` | File | Churn normalized by file size (stronger defect signal) | | `recencyWeightedFreq` | File | Recent activity burst (exponential decay) | | `changeDensity` | File | Commits per month | | `churnVolatility` | File | Regularity of changes (stddev of commit gaps) | | `bugFixRate` | File | Percentage of bug-fix commits ([detection details](#bug-fix-detection)) | | `contributorCount` | File | Number of unique authors | | `dominantAuthor` | File | Author with most commits | | `dominantAuthorPct` | File | Ownership concentration (0-100) | | `ageDays` | File | Days since last modification | | `taskIds` | File | Extracted ticket IDs (JIRA, GitHub, etc.) | | `chunkCommitCount` | Chunk | Commits touching this specific function/block | | `chunkChurnRatio` | Chunk | This chunk's share of file churn (0-1) | | `chunkContributorCount` | Chunk | Authors who touched this chunk | | `chunkBugFixRate` | Chunk | Bug-fix rate for this chunk specifically | | `chunkAgeDays` | Chunk | Days since this chunk was last modified | ### Bug-Fix Commit Detection {#bug-fix-detection} `bugFixRate` and `chunkBugFixRate` rely on heuristic classification of commits as bug fixes. The detection works as follows: **Pattern:** Each commit message is tested against the regex: ```text /\b(fix|bug|hotfix|patch|resolve[sd]?|defect)\b/i ``` This matches whole words only (word boundaries `\b` prevent false positives like "prefix" or "bugle"). The match is case-insensitive and checks the **full commit body** — not just the subject line. **Merge commit filtering:** Commits whose subject line starts with `Merge` (e.g., `Merge branch 'fix/auth'`, `Merge pull request #42`) are **excluded** from bug-fix detection. The rationale: a merge commit referencing a fix branch is not itself a fix — the actual fix commit within the branch is already counted separately. Without this filter, every merged fix branch would be double-counted. **What matches:** | Commit message | Detected? | Why | |----------------|-----------|-----| | `fix: resolve crash on login` | Yes | "fix" in subject | | `hotfix: emergency patch for payments` | Yes | "hotfix" in subject | | `Resolved issue with timeout` | Yes | "Resolved" matches `resolve[sd]?` | | `Bug in date parsing` | Yes | "Bug" matches | | `chore: update deps` | No | No bug-fix keywords | | `Merge branch 'fix/auth'` | No | Merge commit — skipped | | `Merge pull request #42 from user/fix-auth` | No | Merge commit — skipped | | `chore: update auth\nfix: also resolve login bug` | Yes | "fix" found on 2nd line (full body is checked) | **Formula:** ```text bugFixRate = round((bugFixCommits / totalCommits) * 100) ``` Where `bugFixCommits` is the count of non-merge commits matching the pattern. The result is an integer percentage (0-100). **Chunk-level:** `chunkBugFixRate` uses the same detection logic, but only counts commits whose diff hunks overlap the chunk's line range. :::info The pattern is intentionally broad — it catches conventional commits (`fix: ...`), free-form messages (`fixed the bug`), and ticket-driven messages (`resolve TD-123 defect`). False positive rate is low due to word boundary matching. ::: ## Use Cases <AiQuery>Show me files with high churn rate</AiQuery> <AiQuery>Find code with a single dominant author</AiQuery> <AiQuery>What code changed in the last week?</AiQuery> <AiQuery>Find hot functions that change frequently</AiQuery> <AiQuery>Show me legacy code with high bug-fix rates</AiQuery> For detailed scenarios — hotspot detection, knowledge silo analysis, tech debt assessment, incident-driven search, security audit, and more — see [Git Enrichment Use Cases](/usage/use-cases#git-enrichment-use-cases). ## Reranking Presets All presets automatically prefer chunk-level data when available (e.g., `chunkCommitCount` over `commitCount` for churn signals). | Preset | Signals | Use case | |--------|---------|----------| | `hotspots` | chunkChurn + chunkRelativeChurn + burstActivity + bugFix + volatility | Bug-prone areas at function granularity | | `techDebt` | age + churn + bugFix + volatility | Legacy assessment with fix-rate indicator | | `codeReview` | recency + burstActivity + density + chunkChurn | Recent changes with activity intensity | | `stable` | low churn | Reliable implementations | | `ownership` | ownership + knowledgeSilo | Knowledge transfer, bus factor analysis | | `refactoring` | chunkChurn + relativeChurnNorm + chunkSize + volatility + bugFix + age | Refactor candidates at chunk level | | `securityAudit` | age + ownership + bugFix + pathRisk + volatility | Old critical code in sensitive paths | | `impactAnalysis` | similarity + imports | Dependency analysis | | `onboarding` | documentation + stability | Entry points for new team members | ## Scoring Weights Reference Available weight keys for custom reranking: | Key | Signal | Source | |-----|--------|--------| | `similarity` | Embedding similarity score | Vector search | | `recency` | Inverse of ageDays (prefers chunk-level) | git | | `stability` | Inverse of commitCount (prefers chunk-level) | git | | `churn` | Direct commitCount (prefers chunk-level) | git | | `age` | Direct ageDays (prefers chunk-level) | git | | `ownership` | Author concentration via dominantAuthorPct | git | | `chunkSize` | Lines of code in chunk | chunk metadata | | `documentation` | Is documentation file | chunk metadata | | `imports` | Import/dependency count | file metadata | | `bugFix` | bugFixRate (prefers chunk-level) | git | | `volatility` | churnVolatility (stddev of commit gaps) | git | | `density` | changeDensity (commits/month) | git | | `chunkChurn` | chunkCommitCount | git chunk-level | | `relativeChurnNorm` | relativeChurn normalized (churn relative to file size) | git | | `burstActivity` | recencyWeightedFreq — recent burst of changes | git | | `pathRisk` | Security-sensitive path pattern match (0 or 1) | file metadata | | `knowledgeSilo` | Single-contributor flag (1 / 0.5 / 0) | git | | `chunkRelativeChurn` | chunkChurnRatio — chunk's share of file churn | git chunk-level | ## Environment Variables <details> <summary>Git enrichment configuration</summary> | Variable | Default | Description | |----------|---------|-------------| | `CODE_ENABLE_GIT_METADATA` | `"false"` | Enable git enrichment during indexing | | `GIT_LOG_MAX_AGE_MONTHS` | `12` | Time window for file-level git analysis (months). `0` = no age limit (safety depth still applies). | | `GIT_LOG_TIMEOUT_MS` | `30000` | Timeout for isomorphic-git; falls back to native CLI on expiry | | `GIT_LOG_SAFETY_DEPTH` | `10000` | Max commits for isomorphic-git `depth` and CLI `--max-count` | | `GIT_CHUNK_ENABLED` | `"true"` | Enable chunk-level churn analysis | | `GIT_CHUNK_MAX_AGE_MONTHS` | `6` | Time window for chunk-level churn analysis (months). `0` = no age limit. | | `GIT_CHUNK_CONCURRENCY` | `10` | Parallel commit processing for chunk churn | | `GIT_CHUNK_MAX_FILE_LINES` | `10000` | Skip files larger than this for chunk analysis | </details> ## Next Steps - [Filters](/usage/filters) — filter syntax, git churn filters, filterable fields reference - [Code Churn: Theory & Research](/knowledge-base/code-churn-research) — metric formulas, research basis, and academic references - [Git Enrichment Pipeline](/architecture/git-enrichment-pipeline) — architecture, design decisions, and performance characteristics - [Search Strategies](/agent-integration/search-strategies) — how agents use reranking presets for different tasks - [Configuration Variables](/config/environment-variables) — full list of all configuration options

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/artk0de/TeaRAGs-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

git-enrichments.md•8.77 KiB