task_fingerprint
Converts natural-language task descriptions into COV tokens for architecture-aware analysis.
Instructions
Translate natural-language task text into COV tokens.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| task | Yes | ||
| max_tokens | No |
Implementation Reference
- bgi/bgi/mcp/context.py:500-554 (handler)Core implementation of the task_fingerprint method. Maps a natural-language task description into COV (code-operation vocabulary) tokens by matching terms against _TASK_TOKEN_TERMS, scoring them, and returning a fingerprint dict with tokens, confidence, and status.
def task_fingerprint(self, task: str, max_tokens: int = 8) -> dict[str, Any]: """Map a natural-language task into a COV token fingerprint.""" max_tokens = max(1, min(max_tokens, 16)) key = ("task_fingerprint", task, max_tokens) def _build() -> dict[str, Any]: text = task.strip() low = text.lower() scored: list[dict[str, Any]] = [] evidence_weight = 0 for token, terms in self._TASK_TOKEN_TERMS.items(): matched_terms = [term for term in terms if self._match_term(low, term)] if not matched_terms: continue evidence_weight += len(matched_terms) score = min(0.98, 0.35 + (0.12 * len(matched_terms))) scored.append( { "token": token, "score": round(score, 3), "matched_terms": matched_terms, } ) scored.sort(key=lambda row: (row["score"], len(row["matched_terms"]), row["token"]), reverse=True) selected = scored[:max_tokens] task_cov = [row["token"] for row in selected] vague_hits = [term for term in self._VAGUE_TASK_TERMS if self._match_term(low, term)] confidence = 0.1 if selected: confidence = 0.28 + (0.08 * len(selected)) + (0.05 * (evidence_weight / max(1, len(selected)))) if vague_hits and len(selected) <= 2: confidence -= 0.12 confidence = round(max(0.05, min(0.95, confidence)), 2) if not selected or confidence < 0.35: status = "insufficient_signal" elif confidence < 0.55: status = "ambiguous" else: status = "ok" return { "task": task, "tokens": task_cov, "scored_tokens": selected, "confidence": confidence, "status": status, "vague_terms_detected": vague_hits, "interpretation": f"interpreted as COV tokens: {task_cov}" if task_cov else "could not infer clear COV tokens", } return self._cached(key, _build) - bgi/bgi/mcp/server.py:184-200 (registration)MCP tool registration for task_fingerprint using the @mcp.tool() decorator. Defines the tool's name, docstring, parameters (task, max_tokens), and delegates to service.task_fingerprint().
@mcp.tool() def task_fingerprint(task: str, max_tokens: int = 8): """Convert a natural-language task description into COV (code-operation vocabulary) tokens. Use this to extract the structured intent of a task before searching for behavioral twins. COV tokens represent the canonical operations implied by the task (e.g. "validate", "persist", "emit-event"). Do NOT use this for symbol search — use search_symbols instead. Args: task: Natural-language task description (e.g. "add retry logic to the HTTP client"). max_tokens: Maximum number of COV tokens to return (default 8). Returns: A list of COV token strings ranked by relevance to the task. """ return service.task_fingerprint(task=task, max_tokens=max_tokens) - bgi/bgi/mcp/context.py:120-151 (schema)Vague task terms list and the _TASK_TOKEN_TERMS dictionary defining all COV token categories and their matching keywords (e.g., INTAKE, OUTPUT, VALIDATE, PERSIST) used by task_fingerprint.
_VAGUE_TASK_TERMS = ("fix", "bug", "issue", "problem", "improve", "optimize", "refactor", "clean up") _TASK_TOKEN_TERMS: dict[str, tuple[str, ...]] = { "INTAKE": ("input", "request", "payload", "parameter", "param", "args", "ingest", "receive"), "OUTPUT": ("output", "response", "return", "render", "reply"), "TRANSFORM": ("transform", "map", "convert", "normalize", "format"), "MUTATE": ("mutate", "update", "modify", "change state"), "SANITIZE": ("sanitize", "escape", "clean", "scrub"), "CONDITIONAL": ("if", "condition", "branch", "switch"), "LOOP": ("loop", "iterate", "for each", "batch"), "GUARD": ("guard", "check", "prevent", "reject"), "ROUTE": ("route", "endpoint", "handler", "controller", "api"), "SCOPE": ("scope", "local", "global", "nested"), "FETCH": ("fetch", "read", "load", "query", "get"), "PERSIST": ("persist", "save", "store", "write", "commit", "insert"), "EMIT": ("emit", "publish", "notify", "send event", "dispatch"), "SUBSCRIBE": ("subscribe", "listener", "consumer", "watch"), "DELEGATE": ("delegate", "forward", "proxy", "handoff"), "CONTRACT": ("interface", "protocol", "contract", "schema"), "COMPOSE": ("compose", "assemble", "aggregate", "combine"), "INIT": ("init", "initialize", "setup", "boot"), "TEARDOWN": ("teardown", "cleanup", "shutdown", "close"), "RAISE": ("raise", "throw", "fail", "error"), "RECOVER": ("recover", "retry", "fallback", "handle error"), "DEFER": ("defer", "finally", "after", "ensure"), "AUTHENTICATE": ("authenticate", "login", "sign in", "identity"), "AUTHORIZE": ("authorize", "permission", "access control", "policy"), "VALIDATE": ("validate", "verify", "check input", "assert"), "LOG": ("log", "audit", "trace"), "MEASURE": ("measure", "metric", "latency", "timing", "telemetry"), "ASYNC": ("async", "await", "goroutine", "concurrent", "background"), "TEST": ("test", "assertion", "unit test", "integration test"), } - bgi/bgi/mcp/context.py:494-498 (helper)The _match_term helper used by task_fingerprint to match individual terms against the lowercased task text, supporting both single words and multi-word phrases.
@staticmethod def _match_term(text_lower: str, term: str) -> bool: if " " in term: return term in text_lower return re.search(rf"(?<![\\w]){re.escape(term)}(?![\\w])", text_lower) is not None - bgi/bgi/mcp/context.py:293-298 (helper)The _cached method used by task_fingerprint to cache results based on a key tuple (task_fingerprint, task, max_tokens) to avoid recomputation.
def _cached(self, key: tuple[Any, ...], builder) -> Any: if key in self._response_cache: return self._response_cache[key] value = builder() self._response_cache[key] = value return value