task_fingerprint

Converts natural-language task descriptions into COV tokens for architecture-aware analysis.

Instructions

Translate natural-language task text into COV tokens.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`task`	Yes
`max_tokens`	No

Implementation Reference

bgi/bgi/mcp/context.py:500-554 (handler)

Core implementation of the task_fingerprint method. Maps a natural-language task description into COV (code-operation vocabulary) tokens by matching terms against _TASK_TOKEN_TERMS, scoring them, and returning a fingerprint dict with tokens, confidence, and status.

def task_fingerprint(self, task: str, max_tokens: int = 8) -> dict[str, Any]:
    """Map a natural-language task into a COV token fingerprint."""
    max_tokens = max(1, min(max_tokens, 16))
    key = ("task_fingerprint", task, max_tokens)

    def _build() -> dict[str, Any]:
        text = task.strip()
        low = text.lower()
        scored: list[dict[str, Any]] = []
        evidence_weight = 0

        for token, terms in self._TASK_TOKEN_TERMS.items():
            matched_terms = [term for term in terms if self._match_term(low, term)]
            if not matched_terms:
                continue
            evidence_weight += len(matched_terms)
            score = min(0.98, 0.35 + (0.12 * len(matched_terms)))
            scored.append(
                {
                    "token": token,
                    "score": round(score, 3),
                    "matched_terms": matched_terms,
                }
            )

        scored.sort(key=lambda row: (row["score"], len(row["matched_terms"]), row["token"]), reverse=True)
        selected = scored[:max_tokens]
        task_cov = [row["token"] for row in selected]

        vague_hits = [term for term in self._VAGUE_TASK_TERMS if self._match_term(low, term)]
        confidence = 0.1
        if selected:
            confidence = 0.28 + (0.08 * len(selected)) + (0.05 * (evidence_weight / max(1, len(selected))))
        if vague_hits and len(selected) <= 2:
            confidence -= 0.12
        confidence = round(max(0.05, min(0.95, confidence)), 2)

        if not selected or confidence < 0.35:
            status = "insufficient_signal"
        elif confidence < 0.55:
            status = "ambiguous"
        else:
            status = "ok"

        return {
            "task": task,
            "tokens": task_cov,
            "scored_tokens": selected,
            "confidence": confidence,
            "status": status,
            "vague_terms_detected": vague_hits,
            "interpretation": f"interpreted as COV tokens: {task_cov}" if task_cov else "could not infer clear COV tokens",
        }

    return self._cached(key, _build)

bgi/bgi/mcp/server.py:184-200 (registration)

MCP tool registration for task_fingerprint using the @mcp.tool() decorator. Defines the tool's name, docstring, parameters (task, max_tokens), and delegates to service.task_fingerprint().

@mcp.tool()
def task_fingerprint(task: str, max_tokens: int = 8):
    """Convert a natural-language task description into COV (code-operation vocabulary) tokens.

    Use this to extract the structured intent of a task before searching for
    behavioral twins. COV tokens represent the canonical operations implied by
    the task (e.g. "validate", "persist", "emit-event"). Do NOT use this for
    symbol search — use search_symbols instead.

    Args:
        task: Natural-language task description (e.g. "add retry logic to the HTTP client").
        max_tokens: Maximum number of COV tokens to return (default 8).

    Returns:
        A list of COV token strings ranked by relevance to the task.
    """
    return service.task_fingerprint(task=task, max_tokens=max_tokens)

bgi/bgi/mcp/context.py:120-151 (schema)

Vague task terms list and the _TASK_TOKEN_TERMS dictionary defining all COV token categories and their matching keywords (e.g., INTAKE, OUTPUT, VALIDATE, PERSIST) used by task_fingerprint.

_VAGUE_TASK_TERMS = ("fix", "bug", "issue", "problem", "improve", "optimize", "refactor", "clean up")
_TASK_TOKEN_TERMS: dict[str, tuple[str, ...]] = {
    "INTAKE": ("input", "request", "payload", "parameter", "param", "args", "ingest", "receive"),
    "OUTPUT": ("output", "response", "return", "render", "reply"),
    "TRANSFORM": ("transform", "map", "convert", "normalize", "format"),
    "MUTATE": ("mutate", "update", "modify", "change state"),
    "SANITIZE": ("sanitize", "escape", "clean", "scrub"),
    "CONDITIONAL": ("if", "condition", "branch", "switch"),
    "LOOP": ("loop", "iterate", "for each", "batch"),
    "GUARD": ("guard", "check", "prevent", "reject"),
    "ROUTE": ("route", "endpoint", "handler", "controller", "api"),
    "SCOPE": ("scope", "local", "global", "nested"),
    "FETCH": ("fetch", "read", "load", "query", "get"),
    "PERSIST": ("persist", "save", "store", "write", "commit", "insert"),
    "EMIT": ("emit", "publish", "notify", "send event", "dispatch"),
    "SUBSCRIBE": ("subscribe", "listener", "consumer", "watch"),
    "DELEGATE": ("delegate", "forward", "proxy", "handoff"),
    "CONTRACT": ("interface", "protocol", "contract", "schema"),
    "COMPOSE": ("compose", "assemble", "aggregate", "combine"),
    "INIT": ("init", "initialize", "setup", "boot"),
    "TEARDOWN": ("teardown", "cleanup", "shutdown", "close"),
    "RAISE": ("raise", "throw", "fail", "error"),
    "RECOVER": ("recover", "retry", "fallback", "handle error"),
    "DEFER": ("defer", "finally", "after", "ensure"),
    "AUTHENTICATE": ("authenticate", "login", "sign in", "identity"),
    "AUTHORIZE": ("authorize", "permission", "access control", "policy"),
    "VALIDATE": ("validate", "verify", "check input", "assert"),
    "LOG": ("log", "audit", "trace"),
    "MEASURE": ("measure", "metric", "latency", "timing", "telemetry"),
    "ASYNC": ("async", "await", "goroutine", "concurrent", "background"),
    "TEST": ("test", "assertion", "unit test", "integration test"),
}

bgi/bgi/mcp/context.py:494-498 (helper)

The _match_term helper used by task_fingerprint to match individual terms against the lowercased task text, supporting both single words and multi-word phrases.

@staticmethod
def _match_term(text_lower: str, term: str) -> bool:
    if " " in term:
        return term in text_lower
    return re.search(rf"(?<![\\w]){re.escape(term)}(?![\\w])", text_lower) is not None

bgi/bgi/mcp/context.py:293-298 (helper)

The _cached method used by task_fingerprint to cache results based on a key tuple (task_fingerprint, task, max_tokens) to avoid recomputation.

def _cached(self, key: tuple[Any, ...], builder) -> Any:
    if key in self._response_cache:
        return self._response_cache[key]
    value = builder()
    self._response_cache[key] = value
    return value

Big Indexer

task_fingerprint

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API