search_docs
Search Open Finance Brasil documentation using natural-language queries in Portuguese or English. Returns compact snippets from BM25-based indexing.
Instructions
Search the Open Finance Brasil docs (BM25). Returns compact snippets.
Args: query: Natural-language query in Portuguese or English. limit: Max number of hits (default 6, hard-capped at 20).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| limit | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- src/of_mcp/server.py:61-74 (handler)The search_docs tool handler function, registered as an MCP tool. It caps limit to 1-20, queries BM25 search via search_chunks, and returns compact formatted results.
@mcp.tool() def search_docs(query: str, limit: int = 6) -> str: """Search the Open Finance Brasil docs (BM25). Returns compact snippets. Args: query: Natural-language query in Portuguese or English. limit: Max number of hits (default 6, hard-capped at 20). """ limit = max(1, min(int(limit), 20)) with closing(_conn()) as conn: hits = search.search_chunks(conn, query, limit=limit) if not hits: return f"No matches for {query!r}. Index status: {_index_status()}" return search.format_hits_compact(hits) - src/of_mcp/server.py:61-62 (registration)The @mcp.tool() decorator registers search_docs as a FastMCP tool on line 61.
@mcp.tool() def search_docs(query: str, limit: int = 6) -> str: - src/of_mcp/search.py:40-76 (helper)Core BM25 search function that sanitizes the query, runs an FTS5 MATCH against chunks_fts with BM25 ranking, and returns SearchHit objects.
def search_chunks(conn: sqlite3.Connection, query: str, limit: int = 8) -> list[SearchHit]: fts_query = sanitize_query(query) if not fts_query: return [] sql = """ SELECT c.id AS chunk_id, c.page_id AS page_id, p.title AS page_title, p.url AS page_url, c.heading_path AS heading_path, snippet(chunks_fts, 1, '<<', '>>', ' … ', 18) AS snippet, c.body_md AS body_md, c.token_estimate AS token_estimate, bm25(chunks_fts) AS rank FROM chunks_fts JOIN chunks c ON c.id = chunks_fts.rowid JOIN pages p ON p.id = c.page_id WHERE chunks_fts MATCH ? ORDER BY rank LIMIT ? """ rows = conn.execute(sql, (fts_query, limit)).fetchall() return [ SearchHit( chunk_id=r["chunk_id"], page_id=r["page_id"], page_title=r["page_title"], page_url=r["page_url"], heading_path=r["heading_path"], snippet=r["snippet"], body_md=r["body_md"], token_estimate=r["token_estimate"], rank=float(r["rank"]), ) for r in rows ] - src/of_mcp/search.py:79-90 (helper)Formats search results into a token-cheap compact string with snippet and citation for the search_docs tool output.
def format_hits_compact(hits: list[SearchHit]) -> str: """Token-cheap formatting for tool output: snippet + citation per hit.""" if not hits: return "No results." lines = [] for i, h in enumerate(hits, 1): lines.append( f"[{i}] {h.heading_path}\n" f" {h.snippet}\n" f" page_id={h.page_id} url={h.page_url}" ) return "\n\n".join(lines) - src/of_mcp/search.py:9-24 (helper)Sanitizes user queries for FTS5 by stripping special operators and OR-ing prefix-wildcarded terms.
# FTS5 reserved characters / operators we don't want users to accidentally trip. _FTS5_SPECIAL = re.compile(r'[\"\(\)\*\:\^]') def sanitize_query(q: str) -> str: """Make user input safe and useful for FTS5 MATCH. Strategy: strip FTS operators, split on whitespace, and OR the terms with a prefix wildcard so partial words match (e.g., "consen" -> "consen*"). """ cleaned = _FTS5_SPECIAL.sub(" ", q).strip() if not cleaned: return "" tokens = [t for t in cleaned.split() if t] # Quote each term as a phrase, append * for prefix match. OR them. return " OR ".join(f'"{t}"*' for t in tokens)