Tea Rags MCP

custom-reranking.md•6.38 KiB

---
title: Custom Reranking
sidebar_position: 4
---

import AiQuery from '@site/src/components/AiQuery';

# Custom Rerank Strategies

Preset reranking covers common scenarios, but **custom weights** unlock precise, task-specific analysis. The key is combining **orthogonal signals** — each weight should add unique information that the others don't have.

## Strategy 1: Multi-signal risk scoring

Combine orthogonal signals to create a composite risk score:

<AiQuery>Find code ranked by a combination of churn, bug fixes, blast radius, and volatility</AiQuery>

<details>
<summary>Tool parameters — "What should we test more?"</summary>

```json
{
  "rerank": {
    "custom": {
      "chunkChurn": 0.25,
      "bugFix": 0.3,
      "imports": 0.25,
      "volatility": 0.2
    }
  }
}
```

</details>

**Why this works:** `chunkChurn` identifies hot functions, `bugFix` adds quality signal, `imports` adds blast radius, `volatility` adds unpredictability. Each signal adds information the others don't have.

**Anti-pattern:** Don't combine `churn` + `chunkChurn` + `relativeChurnNorm` — these are all churn variants and will just triple-weight the same underlying signal.

## Strategy 2: Inverse scoring for safe code

Sometimes you need the **opposite** of a hotspot — stable, well-owned, low-bug code:

<AiQuery>Find stable, battle-tested implementations with distributed ownership to use as reference</AiQuery>

<details>
<summary>Tool parameters — "Safe reference code"</summary>

```json
{
  "rerank": {
    "custom": {
      "stability": 0.3,
      "age": 0.2,
      "similarity": 0.3,
      "ownership": 0.2
    }
  }
}
```

</details>

**Why this works:** `stability` (inverse of commitCount) boosts low-churn code. `age` (direct ageDays) boosts old code — old + stable = battle-tested. `ownership` boosts distributed authorship — multiple people reviewed this code.

## Strategy 3: Activity pulse

Track where active development is happening right now:

<AiQuery>Find code with recent burst activity and high change density</AiQuery>

<details>
<summary>Tool parameters — "Active development zones"</summary>

```json
{
  "rerank": {
    "custom": {
      "burstActivity": 0.4,
      "density": 0.3,
      "recency": 0.3
    }
  }
}
```

</details>

**Why this works:** `burstActivity` (recencyWeightedFreq) uses exponential decay — a commit from today counts ~10x more than one from 3 weeks ago. Combined with `density` (commits/month) and `recency`, this surfaces the most actively worked-on code.

**Use case:** Sprint planning — see where engineering effort is concentrated. Compare against roadmap priorities.

## Strategy 4: Cross-signal anomaly detection

Find unusual patterns that individual presets miss:

<AiQuery>Find code that is both a knowledge silo and a hotspot in security-sensitive paths</AiQuery>

<details>
<summary>Tool parameters — "Dangerous silos"</summary>

```json
{
  "rerank": {
    "custom": {
      "knowledgeSilo": 0.3,
      "chunkChurn": 0.25,
      "bugFix": 0.25,
      "pathRisk": 0.2
    }
  },
  "pathPattern": "{**/auth/**,**/payment/**,**/crypto/**}"
}
```

</details>

**Why this works:** No single preset combines silo risk + churn + security path. Custom weights let you search for this specific intersection — the most dangerous code in the codebase.

## Guidelines for building custom weights

| Guideline | Explanation |
|-----------|-------------|
| **Keep weights summing to ~1.0** | Weights are normalized internally, but 1.0 makes intent clearer |
| **Use 3-5 signals maximum** | More signals dilute each other — focus on what matters for this specific question |
| **Don't overlap signals** | `churn` + `chunkChurn` + `density` all measure change frequency — pick one |
| **Include `similarity` at 0.2-0.4** | Unless doing pure metadata analysis, some semantic relevance prevents nonsensical matches |
| **Test with `metaOnly: true` first** | See the scoring before downloading full code content |

## Signal Overlap Reference {#signal-overlap-reference}

Signals that measure similar things — avoid combining within the same custom rerank:

| Signal group | Members | Pick one |
|-------------|---------|----------|
| **Churn frequency** | `churn`, `chunkChurn`, `density`, `burstActivity` | `chunkChurn` for function-level, `burstActivity` for recency-weighted |
| **Churn magnitude** | `relativeChurnNorm`, `chunkRelativeChurn` | `chunkRelativeChurn` for function-level |
| **Age/freshness** | `age`, `recency` | `recency` if you want recent code, `age` if you want old code |
| **Ownership** | `ownership`, `knowledgeSilo` | `knowledgeSilo` for binary silo detection, `ownership` for gradient |

Signals that are **orthogonal** and combine well:

| Signal A | + Signal B | Combined meaning |
|----------|-----------|-----------------|
| `chunkChurn` | `bugFix` | Frequently changed + mostly bug fixes = quality problem |
| `knowledgeSilo` | `imports` | Single owner + many dependents = dangerous silo |
| `stability` | `age` | Low churn + old = battle-tested code |
| `burstActivity` | `pathRisk` | Recent activity in security paths = needs review |
| `chunkRelativeChurn` | `volatility` | Function absorbs disproportionate churn + irregular pattern = structural problem |

## Known Limitations

1. **Schema gap:** The `ScoringWeightsSchema` in the MCP tool definitions does not yet expose the newer weight keys (`relativeChurnNorm`, `burstActivity`, `pathRisk`, `knowledgeSilo`, `chunkRelativeChurn`). Agents using preset strings are unaffected; agents constructing custom weights for these signals will need the schema updated.

2. **No cross-search chaining:** Each search is independent. The agent must manually chain results from one search into filters for the next. There is no built-in "find all files that import results from my previous search."

3. **Git metadata required:** All reranking presets except `relevance` require `CODE_ENABLE_GIT_METADATA=true` during indexing. Without git enrichment, non-relevance presets silently degrade to similarity-only scoring.

4. **Chunk-level data is partial:** Chunk-level metrics (chunkCommitCount, chunkBugFixRate, etc.) are only available for files with multiple chunks and recent commits within the `GIT_CHUNK_MAX_AGE_MONTHS` window. Single-chunk files and old-only commits fall back to file-level metrics.

5. **No fan-in (importedBy) data yet:** The current `impactAnalysis` preset uses only fan-out (imports count). Fan-in metrics and the `blastRadius` preset are planned.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/artk0de/TeaRAGs-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

custom-reranking.md•6.38 KiB

---
title: Custom Reranking
sidebar_position: 4
---

import AiQuery from '@site/src/components/AiQuery';

# Custom Rerank Strategies

Preset reranking covers common scenarios, but **custom weights** unlock precise, task-specific analysis. The key is combining **orthogonal signals** — each weight should add unique information that the others don't have.

## Strategy 1: Multi-signal risk scoring

Combine orthogonal signals to create a composite risk score:

<AiQuery>Find code ranked by a combination of churn, bug fixes, blast radius, and volatility</AiQuery>

<details>
<summary>Tool parameters — "What should we test more?"</summary>

```json
{
  "rerank": {
    "custom": {
      "chunkChurn": 0.25,
      "bugFix": 0.3,
      "imports": 0.25,
      "volatility": 0.2
    }
  }
}
```

</details>

**Why this works:** `chunkChurn` identifies hot functions, `bugFix` adds quality signal, `imports` adds blast radius, `volatility` adds unpredictability. Each signal adds information the others don't have.

**Anti-pattern:** Don't combine `churn` + `chunkChurn` + `relativeChurnNorm` — these are all churn variants and will just triple-weight the same underlying signal.

## Strategy 2: Inverse scoring for safe code

Sometimes you need the **opposite** of a hotspot — stable, well-owned, low-bug code:

<AiQuery>Find stable, battle-tested implementations with distributed ownership to use as reference</AiQuery>

<details>
<summary>Tool parameters — "Safe reference code"</summary>

```json
{
  "rerank": {
    "custom": {
      "stability": 0.3,
      "age": 0.2,
      "similarity": 0.3,
      "ownership": 0.2
    }
  }
}
```

</details>

**Why this works:** `stability` (inverse of commitCount) boosts low-churn code. `age` (direct ageDays) boosts old code — old + stable = battle-tested. `ownership` boosts distributed authorship — multiple people reviewed this code.

## Strategy 3: Activity pulse

Track where active development is happening right now:

<AiQuery>Find code with recent burst activity and high change density</AiQuery>

<details>
<summary>Tool parameters — "Active development zones"</summary>

```json
{
  "rerank": {
    "custom": {
      "burstActivity": 0.4,
      "density": 0.3,
      "recency": 0.3
    }
  }
}
```

</details>

**Why this works:** `burstActivity` (recencyWeightedFreq) uses exponential decay — a commit from today counts ~10x more than one from 3 weeks ago. Combined with `density` (commits/month) and `recency`, this surfaces the most actively worked-on code.

**Use case:** Sprint planning — see where engineering effort is concentrated. Compare against roadmap priorities.

## Strategy 4: Cross-signal anomaly detection

Find unusual patterns that individual presets miss:

<AiQuery>Find code that is both a knowledge silo and a hotspot in security-sensitive paths</AiQuery>

<details>
<summary>Tool parameters — "Dangerous silos"</summary>

```json
{
  "rerank": {
    "custom": {
      "knowledgeSilo": 0.3,
      "chunkChurn": 0.25,
      "bugFix": 0.25,
      "pathRisk": 0.2
    }
  },
  "pathPattern": "{**/auth/**,**/payment/**,**/crypto/**}"
}
```

</details>

**Why this works:** No single preset combines silo risk + churn + security path. Custom weights let you search for this specific intersection — the most dangerous code in the codebase.

## Guidelines for building custom weights

| Guideline | Explanation |
|-----------|-------------|
| **Keep weights summing to ~1.0** | Weights are normalized internally, but 1.0 makes intent clearer |
| **Use 3-5 signals maximum** | More signals dilute each other — focus on what matters for this specific question |
| **Don't overlap signals** | `churn` + `chunkChurn` + `density` all measure change frequency — pick one |
| **Include `similarity` at 0.2-0.4** | Unless doing pure metadata analysis, some semantic relevance prevents nonsensical matches |
| **Test with `metaOnly: true` first** | See the scoring before downloading full code content |

## Signal Overlap Reference {#signal-overlap-reference}

Signals that measure similar things — avoid combining within the same custom rerank:

| Signal group | Members | Pick one |
|-------------|---------|----------|
| **Churn frequency** | `churn`, `chunkChurn`, `density`, `burstActivity` | `chunkChurn` for function-level, `burstActivity` for recency-weighted |
| **Churn magnitude** | `relativeChurnNorm`, `chunkRelativeChurn` | `chunkRelativeChurn` for function-level |
| **Age/freshness** | `age`, `recency` | `recency` if you want recent code, `age` if you want old code |
| **Ownership** | `ownership`, `knowledgeSilo` | `knowledgeSilo` for binary silo detection, `ownership` for gradient |

Signals that are **orthogonal** and combine well:

| Signal A | + Signal B | Combined meaning |
|----------|-----------|-----------------|
| `chunkChurn` | `bugFix` | Frequently changed + mostly bug fixes = quality problem |
| `knowledgeSilo` | `imports` | Single owner + many dependents = dangerous silo |
| `stability` | `age` | Low churn + old = battle-tested code |
| `burstActivity` | `pathRisk` | Recent activity in security paths = needs review |
| `chunkRelativeChurn` | `volatility` | Function absorbs disproportionate churn + irregular pattern = structural problem |

## Known Limitations

1. **Schema gap:** The `ScoringWeightsSchema` in the MCP tool definitions does not yet expose the newer weight keys (`relativeChurnNorm`, `burstActivity`, `pathRisk`, `knowledgeSilo`, `chunkRelativeChurn`). Agents using preset strings are unaffected; agents constructing custom weights for these signals will need the schema updated.

2. **No cross-search chaining:** Each search is independent. The agent must manually chain results from one search into filters for the next. There is no built-in "find all files that import results from my previous search."

3. **Git metadata required:** All reranking presets except `relevance` require `CODE_ENABLE_GIT_METADATA=true` during indexing. Without git enrichment, non-relevance presets silently degrade to similarity-only scoring.

4. **Chunk-level data is partial:** Chunk-level metrics (chunkCommitCount, chunkBugFixRate, etc.) are only available for files with multiple chunks and recent commits within the `GIT_CHUNK_MAX_AGE_MONTHS` window. Single-chunk files and old-only commits fall back to file-level metrics.

5. **No fan-in (importedBy) data yet:** The current `impactAnalysis` preset uses only fan-out (imports count). Fan-in metrics and the `blastRadius` preset are planned.