Tea Rags MCP

TeaRAGs-MCP
website
docs
agent-integration

mental-model.md•10.1 KiB

--- title: "Mental Model" sidebar_position: 1 --- # How to Think with TeaRAGs TeaRAGs is not a search engine with extra metadata. It's a system that changes **how coding agents should reason about which code to trust**. This page explains the shift. ## Why This Page Exists Most developers and agents treat code retrieval as a similarity problem: "find code that looks like my query." TeaRAGs adds a second dimension — **code evolution** — and this requires a different mental model. Without it, you'll use TeaRAGs as a fancy grep and miss the point entirely. ## Traditional RAG Mindset In standard code RAG, the retrieval loop is: 1. Embed the query 2. Find the most similar chunks 3. Inject them into context 4. Generate code The optimization target is **relevance**: how closely does the retrieved code match the query? The implicit assumption is that similar code is useful code. This works until it doesn't. The first search hit might be: - A prototype someone abandoned - A pattern that was reverted three times - Code written by an intern, rewritten by a senior, then rewritten again - A function that technically does what you want but breaks every sprint Similarity tells you nothing about any of this. ## Trajectory-Aware Mindset TeaRAGs shifts the optimization target from **"find similar code"** to **"find code that improves the agent's decision quality."** Every retrieved chunk carries 19 git-derived signals — churn, stability, authorship, bug-fix rates, code age, task references. These signals encode **how code has evolved**, not just what it looks like right now. The thinking patterns change: | Traditional RAG | Trajectory-Aware RAG | |----------------|---------------------| | "Find code that looks like X" | "Find code that looks like X **and has survived production**" | | "Copy the first match" | "Copy the match with the lowest bug-fix rate" | | "Any example will do" | "Find the domain owner's implementation" | | "This code is relevant" | "This code is relevant **and stable** — or relevant **and volatile** (which is a signal too)" | | "Ignore code history" | "High churn = treat as anti-pattern, not template" | The key insight: **volatile code is not noise to be filtered out — it's a signal.** A function with 12 commits and a 60% bug-fix rate tells the agent something important: don't copy this, don't extend this without understanding why it keeps breaking. ## How Agents Should Reason An agent using TeaRAGs effectively follows three steps: ### 1. Explore Context Search semantically to understand the landscape. Use `rerank: "relevance"` first — find what exists. ### 2. Evaluate Signals Before using any result, check the trajectory signals: - **Low churn + old age** → stable pattern, safe to copy - **High churn + high bug-fix rate** → anti-pattern, study but don't replicate - **Single dominant author** → domain expert's style, match it - **Multiple task IDs** → code evolved through many requirements, understand them before modifying - **Recent + high commit count** → active development area, coordinate with the team ### 3. Select for Decision Quality Choose context that leads to better decisions, not just more context. Three chunks of battle-tested code are more valuable than twenty chunks of relevant-but-unknown-quality code. ## Decision Hierarchy When an agent receives search results, signals should be evaluated in this order: ``` 1. Similarity — is this code relevant to my task? 2. Trajectory — is this code stable, owned, and low-risk? 3. Impact — if I base my code on this, what's the blast radius? ``` Similarity is the entry filter. Trajectory signals determine trust. Impact determines caution. A result that scores high on similarity but low on stability is a **warning**, not a template. A result that scores moderately on similarity but high on stability and clear ownership is often the better choice. ## Git Data Depth TeaRAGs collects git history at two different granularities, each with its own time window: | Level | Variable | Default | What It Controls | |-------|----------|---------|-----------------| | **File-level** | `GIT_LOG_MAX_AGE_MONTHS` | 12 months | `git log` analysis: commit counts, authors, task IDs per file | | **Chunk-level** | `GIT_CHUNK_MAX_AGE_MONTHS` | 6 months | `git blame` analysis: per-function churn, volatility, bug-fix rates | The defaults are deliberately generous. Research on code churn and defect prediction consistently shows that **shorter windows (2–6 months) capture the most actionable signals**, while longer windows add historical context at the cost of noise from resolved issues. - **Nagappan & Ball (2005)** demonstrated that relative code churn measures — especially when normalized by time — are strong predictors of defect density. Their "weeks of churn / file count" metric highlights the importance of temporal extent in churn analysis. ([IEEE ICSE 2005](https://ieeexplore.ieee.org/document/1553571/)) - **Adam Tornhill** ("Your Code as a Crime Scene", "Software Design X-Rays") recommends **2–3 month windows** as a practical heuristic for hotspot and temporal coupling analysis — old data from resolved issues can interfere with current analysis. ([adamtornhill.com](https://www.adamtornhill.com/articles/crimescene/codeascrimescene.htm), [Pragmatic Bookshelf](https://pragprog.com/titles/atcrime2/your-code-as-a-crime-scene-second-edition/)) - **GitClear (2024–2025)** analyzed 211M changed lines and found that code revised within two weeks of commit is a reliable quality signal — supporting the idea that shorter, focused windows detect instability better than full-history analysis. ([GitClear Research](https://www.gitclear.com/ai_assistant_code_quality_2025_research)) ### Why File-Level Metrics Matter for Tech Debt The 12-month file-level window (`GIT_LOG_MAX_AGE_MONTHS`) serves a different purpose than chunk-level churn. While chunk-level signals tell you which *functions* are unstable, file-level signals reveal **structural tech debt** — the kind that accumulates silently and surfaces as friction during development. TeaRAGs computes these file-level metrics from `git log`: | Metric | What It Reveals | |--------|----------------| | `commitCount` | How often the file changes — high values indicate a coordination bottleneck | | `relativeChurn` | (linesAdded + linesDeleted) / currentLines — how much of the file has been rewritten | | `changeDensity` | commits / months — average change frequency over the analysis window | | `churnVolatility` | stddev(days between commits) — erratic patterns suggest reactive patching | | `bugFixRate` | % of commits with fix/bug/hotfix keywords — direct measure of defect density | | `contributorCount` | Number of unique authors — high counts + high churn = coordination cost | | `dominantAuthorPct` | How concentrated ownership is — low % on a high-churn file = no one owns the debt | | `taskIds` | Ticket references from commits — traces debt back to business decisions | The research behind this approach: - **Tornhill's hotspot model** identifies tech debt as the *product* of complexity and change frequency — a complex file that rarely changes is low priority, but a complex file that changes weekly is the most expensive debt in your system. ([CodeScene: Technical Debt](https://codescene.io/docs/guides/technical/hotspots.html)) - **CodeScene's Code Health** metric (1–10 scale) combines behavioral signals (churn, coupling) with structural ones (complexity) to prioritize which debt to pay first — proving that file-level git metrics are sufficient for actionable debt ranking even without parsing the code. ([CodeScene: Code Health](https://codescene.com/product/code-health)) - **GitClear's tech debt model** tracks "recurrently active" files (modified 2+ times per month across multiple months) as the strongest file-level predictor of future defects — more reliable than raw churn alone. ([GitClear: Measuring Tech Debt](https://www.gitclear.com/measuring_tech_debt_a_guide_for_data_driven_technical_managers)) - **Gartner (2024)** recognized behavioral code analysis (churn + complexity product) as one of five tool categories for measuring and monitoring tech debt. ([Gartner Report via CodeScene](https://codescene.com/resources/gartner-report-measure-and-monitor-technical-debt-with-5-types-of-tools)) An agent using `rerank: "techDebt"` leverages these file-level signals to surface files where debt is highest. Combined with `metaOnly: true`, it can build a tech debt report without reading a single line of code — purely from evolution signals. ### Adjusting Thresholds The defaults (6 months chunk / 12 months file) work well for most codebases. But agents can reason about whether to adjust: - **Young codebase (<1 year)** — defaults cover the entire history, no adjustment needed - **Mature codebase (5+ years)** — defaults already filter out ancient history; if signals feel noisy, *reduce* `GIT_CHUNK_MAX_AGE_MONTHS` to 3 - **High-velocity team (daily deploys)** — consider reducing chunk window to 3 months for sharper hotspot detection - **Legacy migration** — *increase* `GIT_LOG_MAX_AGE_MONTHS` to 24+ to capture the full migration arc The reranker normalizes all signals relative to the analysis window. Changing the window doesn't break presets — it shifts what "old" and "high churn" mean within that window. ## The Common Mistake The most common mistake is using TeaRAGs as plain semantic search — ignoring the enrichment signals and treating results as a flat ranked list. If you're not using `rerank` presets, not reading `bugFixRate` or `commitCount` in results, not distinguishing between stable and volatile code — you're paying the cost of trajectory enrichment without getting the benefit. You'd get the same results from any vector search tool. TeaRAGs becomes valuable when the agent **reasons about the signals**, not when it merely retrieves more context. ## Key Takeaway Traditional RAG asks: *"What code looks like what I need?"* TeaRAGs asks: *"What code looks like what I need, has proven itself in production, is owned by someone who knows the domain, and won't introduce the same bugs that have already been fixed three times?"* The difference is not in retrieval quality. It's in **decision quality**.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/artk0de/TeaRAGs-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

mental-model.md•10.1 KiB