Skip to main content
Glama

explain_search

Analyze hybrid search results to debug and optimize search parameters by showing FTS-only hits, vector-only hits, and intersection hits for academic literature.

Instructions

详细解释搜索结果

执行混合搜索并返回详细的解释信息,包括 FTS-only 命中、向量-only 命中、交集命中等。 用于调试和优化搜索参数。

Args: query: 搜索查询字符串 k: 返回结果数量,默认 10 alpha: 向量搜索权重(0-1),默认 0.6 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 fts_topn: FTS 候选数量,默认 50 vec_topn: 向量候选数量,默认 50

Returns: 详细的搜索解释,包含: - final_results: 最终 top-k 结果 - fts_only_hits: 仅 FTS 命中的结果 - vec_only_hits: 仅向量命中的结果 - intersection_hits: 两者都命中的结果 - stats: 统计信息

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
kNo
alphaNo
per_doc_limitNo
fts_topnNo
vec_topnNo

Implementation Reference

  • MCP tool handler for 'explain_search': executes the detailed hybrid search and returns structured response with explanations.
    @mcp.tool()
    async def explain_search(
        query: str,
        k: int = 10,
        alpha: float = 0.6,
        per_doc_limit: int = 3,
        fts_topn: int = 50,
        vec_topn: int = 50,
    ) -> dict[str, Any]:
        """详细解释搜索结果
        
        执行混合搜索并返回详细的解释信息,包括 FTS-only 命中、向量-only 命中、交集命中等。
        用于调试和优化搜索参数。
        
        Args:
            query: 搜索查询字符串
            k: 返回结果数量,默认 10
            alpha: 向量搜索权重(0-1),默认 0.6
            per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3
            fts_topn: FTS 候选数量,默认 50
            vec_topn: 向量候选数量,默认 50
            
        Returns:
            详细的搜索解释,包含:
            - final_results: 最终 top-k 结果
            - fts_only_hits: 仅 FTS 命中的结果
            - vec_only_hits: 仅向量命中的结果
            - intersection_hits: 两者都命中的结果
            - stats: 统计信息
        """
        try:
            response = await explain_hybrid_search(
                query, k, alpha, fts_topn, vec_topn,
                per_doc_limit=per_doc_limit if per_doc_limit > 0 else None
            )
            return response.model_dump()
        except Exception as e:
            return {
                "error": str(e),
                "query": query,
                "k": k,
                "alpha": alpha,
                "final_results": [],
                "fts_only_hits": [],
                "vec_only_hits": [],
                "intersection_hits": [],
                "stats": {},
            }
  • Pydantic schema defining the output structure for the explain_search tool response.
    class ExplainSearchResponse(BaseModel):
        """详细搜索解释响应"""
        query: str
        k: int
        alpha: float
        per_doc_limit: int | None
        fts_topn: int
        vec_topn: int
        final_results: list[SearchResult]
        fts_only_hits: list[SearchResult]
        vec_only_hits: list[SearchResult]
        intersection_hits: list[SearchResult]
        stats: dict[str, Any]
  • Core helper function implementing the detailed hybrid search logic with FTS and vector search explanations, categories hits, computes stats.
    async def explain_hybrid_search(
        query: str,
        k: int = 10,
        alpha: float = 0.6,
        fts_topn: int = 50,
        vec_topn: int = 50,
        per_doc_limit: int | None = None,
    ) -> ExplainSearchResponse:
        """详细的混合搜索(带解释)- 异步并行版"""
        
        # 并行执行 FTS 和 Embedding
        fts_task = asyncio.to_thread(search_fts, query, fts_topn)
        emb_task = aget_embeddings_batch([query])
        
        fts_results, embeddings = await asyncio.gather(fts_task, emb_task)
        query_embedding = embeddings[0]
        fts_chunk_ids = {r["chunk_id"] for r in fts_results}
        
        # 执行向量搜索
        vec_results = await asyncio.to_thread(search_vector, query_embedding, vec_topn)
        vec_chunk_ids = {r["chunk_id"] for r in vec_results}
        
        # 3. 计算交集和差集
        intersection_ids = fts_chunk_ids & vec_chunk_ids
        fts_only_ids = fts_chunk_ids - vec_chunk_ids
        vec_only_ids = vec_chunk_ids - fts_chunk_ids
        
        # 4. 合并结果
        all_chunks: dict[int, dict[str, Any]] = {}
        
        # 处理 FTS 结果
        if fts_results:
            max_rank = max(r["rank"] for r in fts_results) or 1.0
            for r in fts_results:
                chunk_id = r["chunk_id"]
                fts_score = r["rank"] / max_rank
                all_chunks[chunk_id] = {
                    "chunk_id": chunk_id,
                    "doc_id": r["doc_id"],
                    "page_start": r["page_start"],
                    "page_end": r["page_end"],
                    "text": r["text"],
                    "score_fts": fts_score,
                    "score_vec": None,
                }
        
        # 处理向量结果
        if vec_results:
            for r in vec_results:
                chunk_id = r["chunk_id"]
                vec_score = 1.0 - r["distance"]
                
                if chunk_id in all_chunks:
                    all_chunks[chunk_id]["score_vec"] = vec_score
                else:
                    all_chunks[chunk_id] = {
                        "chunk_id": chunk_id,
                        "doc_id": r["doc_id"],
                        "page_start": r["page_start"],
                        "page_end": r["page_end"],
                        "text": r["text"],
                        "score_fts": None,
                        "score_vec": vec_score,
                    }
        
        # 5. 创建所有结果(带综合分数)
        def make_result(chunk_data: dict[str, Any]) -> SearchResult:
            fts_score = chunk_data["score_fts"] or 0.0
            vec_score = chunk_data["score_vec"] or 0.0
            total_score = alpha * vec_score + (1 - alpha) * fts_score
            text = chunk_data["text"]
            snippet = text[:200] + "..." if len(text) > 200 else text
            
            return SearchResult(
                chunk_id=chunk_data["chunk_id"],
                doc_id=chunk_data["doc_id"],
                page_start=chunk_data["page_start"],
                page_end=chunk_data["page_end"],
                snippet=snippet,
                score_total=total_score,
                score_vec=chunk_data["score_vec"],
                score_fts=chunk_data["score_fts"],
            )
        
        # 创建各类结果列表
        all_results = [make_result(all_chunks[cid]) for cid in all_chunks]
        fts_only_hits = [make_result(all_chunks[cid]) for cid in fts_only_ids]
        vec_only_hits = [make_result(all_chunks[cid]) for cid in vec_only_ids]
        intersection_hits = [make_result(all_chunks[cid]) for cid in intersection_ids]
        
        # 排序
        all_results.sort(key=lambda x: x.score_total, reverse=True)
        fts_only_hits.sort(key=lambda x: x.score_fts or 0, reverse=True)
        vec_only_hits.sort(key=lambda x: x.score_vec or 0, reverse=True)
        intersection_hits.sort(key=lambda x: x.score_total, reverse=True)
        
        # 应用每文档限制
        final_results = apply_per_doc_limit(all_results, per_doc_limit)[:k]
        
        # 统计
        stats = {
            "total_candidates": len(all_chunks),
            "fts_candidates": len(fts_results),
            "vec_candidates": len(vec_results),
            "fts_only_count": len(fts_only_ids),
            "vec_only_count": len(vec_only_ids),
            "intersection_count": len(intersection_ids),
            "unique_docs_in_final": len(set(r.doc_id for r in final_results)),
        }
        
        return ExplainSearchResponse(
            query=query,
            k=k,
            alpha=alpha,
            per_doc_limit=per_doc_limit,
            fts_topn=fts_topn,
            vec_topn=vec_topn,
            final_results=final_results,
            fts_only_hits=fts_only_hits[:10],  # 只返回前 10 个
            vec_only_hits=vec_only_hits[:10],
            intersection_hits=intersection_hits[:10],
            stats=stats,
        )
  • Top-level registration call that invokes the function to register the explain_search tool (and other search tools) to the MCP server instance.
    register_search_tools(mcp)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server