explain_search
Analyze hybrid search results to debug and optimize search parameters by showing FTS-only hits, vector-only hits, and intersection hits for academic literature.
Instructions
详细解释搜索结果
执行混合搜索并返回详细的解释信息,包括 FTS-only 命中、向量-only 命中、交集命中等。 用于调试和优化搜索参数。
Args: query: 搜索查询字符串 k: 返回结果数量,默认 10 alpha: 向量搜索权重(0-1),默认 0.6 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 fts_topn: FTS 候选数量,默认 50 vec_topn: 向量候选数量,默认 50
Returns: 详细的搜索解释,包含: - final_results: 最终 top-k 结果 - fts_only_hits: 仅 FTS 命中的结果 - vec_only_hits: 仅向量命中的结果 - intersection_hits: 两者都命中的结果 - stats: 统计信息
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| k | No | ||
| alpha | No | ||
| per_doc_limit | No | ||
| fts_topn | No | ||
| vec_topn | No |
Implementation Reference
- src/paperlib_mcp/tools/search.py:428-476 (handler)MCP tool handler for 'explain_search': executes the detailed hybrid search and returns structured response with explanations.@mcp.tool() async def explain_search( query: str, k: int = 10, alpha: float = 0.6, per_doc_limit: int = 3, fts_topn: int = 50, vec_topn: int = 50, ) -> dict[str, Any]: """详细解释搜索结果 执行混合搜索并返回详细的解释信息,包括 FTS-only 命中、向量-only 命中、交集命中等。 用于调试和优化搜索参数。 Args: query: 搜索查询字符串 k: 返回结果数量,默认 10 alpha: 向量搜索权重(0-1),默认 0.6 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 fts_topn: FTS 候选数量,默认 50 vec_topn: 向量候选数量,默认 50 Returns: 详细的搜索解释,包含: - final_results: 最终 top-k 结果 - fts_only_hits: 仅 FTS 命中的结果 - vec_only_hits: 仅向量命中的结果 - intersection_hits: 两者都命中的结果 - stats: 统计信息 """ try: response = await explain_hybrid_search( query, k, alpha, fts_topn, vec_topn, per_doc_limit=per_doc_limit if per_doc_limit > 0 else None ) return response.model_dump() except Exception as e: return { "error": str(e), "query": query, "k": k, "alpha": alpha, "final_results": [], "fts_only_hits": [], "vec_only_hits": [], "intersection_hits": [], "stats": {}, }
- Pydantic schema defining the output structure for the explain_search tool response.class ExplainSearchResponse(BaseModel): """详细搜索解释响应""" query: str k: int alpha: float per_doc_limit: int | None fts_topn: int vec_topn: int final_results: list[SearchResult] fts_only_hits: list[SearchResult] vec_only_hits: list[SearchResult] intersection_hits: list[SearchResult] stats: dict[str, Any]
- Core helper function implementing the detailed hybrid search logic with FTS and vector search explanations, categories hits, computes stats.async def explain_hybrid_search( query: str, k: int = 10, alpha: float = 0.6, fts_topn: int = 50, vec_topn: int = 50, per_doc_limit: int | None = None, ) -> ExplainSearchResponse: """详细的混合搜索(带解释)- 异步并行版""" # 并行执行 FTS 和 Embedding fts_task = asyncio.to_thread(search_fts, query, fts_topn) emb_task = aget_embeddings_batch([query]) fts_results, embeddings = await asyncio.gather(fts_task, emb_task) query_embedding = embeddings[0] fts_chunk_ids = {r["chunk_id"] for r in fts_results} # 执行向量搜索 vec_results = await asyncio.to_thread(search_vector, query_embedding, vec_topn) vec_chunk_ids = {r["chunk_id"] for r in vec_results} # 3. 计算交集和差集 intersection_ids = fts_chunk_ids & vec_chunk_ids fts_only_ids = fts_chunk_ids - vec_chunk_ids vec_only_ids = vec_chunk_ids - fts_chunk_ids # 4. 合并结果 all_chunks: dict[int, dict[str, Any]] = {} # 处理 FTS 结果 if fts_results: max_rank = max(r["rank"] for r in fts_results) or 1.0 for r in fts_results: chunk_id = r["chunk_id"] fts_score = r["rank"] / max_rank all_chunks[chunk_id] = { "chunk_id": chunk_id, "doc_id": r["doc_id"], "page_start": r["page_start"], "page_end": r["page_end"], "text": r["text"], "score_fts": fts_score, "score_vec": None, } # 处理向量结果 if vec_results: for r in vec_results: chunk_id = r["chunk_id"] vec_score = 1.0 - r["distance"] if chunk_id in all_chunks: all_chunks[chunk_id]["score_vec"] = vec_score else: all_chunks[chunk_id] = { "chunk_id": chunk_id, "doc_id": r["doc_id"], "page_start": r["page_start"], "page_end": r["page_end"], "text": r["text"], "score_fts": None, "score_vec": vec_score, } # 5. 创建所有结果(带综合分数) def make_result(chunk_data: dict[str, Any]) -> SearchResult: fts_score = chunk_data["score_fts"] or 0.0 vec_score = chunk_data["score_vec"] or 0.0 total_score = alpha * vec_score + (1 - alpha) * fts_score text = chunk_data["text"] snippet = text[:200] + "..." if len(text) > 200 else text return SearchResult( chunk_id=chunk_data["chunk_id"], doc_id=chunk_data["doc_id"], page_start=chunk_data["page_start"], page_end=chunk_data["page_end"], snippet=snippet, score_total=total_score, score_vec=chunk_data["score_vec"], score_fts=chunk_data["score_fts"], ) # 创建各类结果列表 all_results = [make_result(all_chunks[cid]) for cid in all_chunks] fts_only_hits = [make_result(all_chunks[cid]) for cid in fts_only_ids] vec_only_hits = [make_result(all_chunks[cid]) for cid in vec_only_ids] intersection_hits = [make_result(all_chunks[cid]) for cid in intersection_ids] # 排序 all_results.sort(key=lambda x: x.score_total, reverse=True) fts_only_hits.sort(key=lambda x: x.score_fts or 0, reverse=True) vec_only_hits.sort(key=lambda x: x.score_vec or 0, reverse=True) intersection_hits.sort(key=lambda x: x.score_total, reverse=True) # 应用每文档限制 final_results = apply_per_doc_limit(all_results, per_doc_limit)[:k] # 统计 stats = { "total_candidates": len(all_chunks), "fts_candidates": len(fts_results), "vec_candidates": len(vec_results), "fts_only_count": len(fts_only_ids), "vec_only_count": len(vec_only_ids), "intersection_count": len(intersection_ids), "unique_docs_in_final": len(set(r.doc_id for r in final_results)), } return ExplainSearchResponse( query=query, k=k, alpha=alpha, per_doc_limit=per_doc_limit, fts_topn=fts_topn, vec_topn=vec_topn, final_results=final_results, fts_only_hits=fts_only_hits[:10], # 只返回前 10 个 vec_only_hits=vec_only_hits[:10], intersection_hits=intersection_hits[:10], stats=stats, )
- src/paperlib_mcp/server.py:35-35 (registration)Top-level registration call that invokes the function to register the explain_search tool (and other search tools) to the MCP server instance.register_search_tools(mcp)