Skip to main content
Glama

explain_search

Analyze hybrid search results to debug and optimize search parameters by showing FTS-only hits, vector-only hits, and intersection hits for academic literature.

Instructions

详细解释搜索结果

执行混合搜索并返回详细的解释信息,包括 FTS-only 命中、向量-only 命中、交集命中等。 用于调试和优化搜索参数。

Args: query: 搜索查询字符串 k: 返回结果数量,默认 10 alpha: 向量搜索权重(0-1),默认 0.6 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 fts_topn: FTS 候选数量,默认 50 vec_topn: 向量候选数量,默认 50

Returns: 详细的搜索解释,包含: - final_results: 最终 top-k 结果 - fts_only_hits: 仅 FTS 命中的结果 - vec_only_hits: 仅向量命中的结果 - intersection_hits: 两者都命中的结果 - stats: 统计信息

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
kNo
alphaNo
per_doc_limitNo
fts_topnNo
vec_topnNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool handler for 'explain_search': executes the detailed hybrid search and returns structured response with explanations.
    @mcp.tool()
    async def explain_search(
        query: str,
        k: int = 10,
        alpha: float = 0.6,
        per_doc_limit: int = 3,
        fts_topn: int = 50,
        vec_topn: int = 50,
    ) -> dict[str, Any]:
        """详细解释搜索结果
        
        执行混合搜索并返回详细的解释信息,包括 FTS-only 命中、向量-only 命中、交集命中等。
        用于调试和优化搜索参数。
        
        Args:
            query: 搜索查询字符串
            k: 返回结果数量,默认 10
            alpha: 向量搜索权重(0-1),默认 0.6
            per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3
            fts_topn: FTS 候选数量,默认 50
            vec_topn: 向量候选数量,默认 50
            
        Returns:
            详细的搜索解释,包含:
            - final_results: 最终 top-k 结果
            - fts_only_hits: 仅 FTS 命中的结果
            - vec_only_hits: 仅向量命中的结果
            - intersection_hits: 两者都命中的结果
            - stats: 统计信息
        """
        try:
            response = await explain_hybrid_search(
                query, k, alpha, fts_topn, vec_topn,
                per_doc_limit=per_doc_limit if per_doc_limit > 0 else None
            )
            return response.model_dump()
        except Exception as e:
            return {
                "error": str(e),
                "query": query,
                "k": k,
                "alpha": alpha,
                "final_results": [],
                "fts_only_hits": [],
                "vec_only_hits": [],
                "intersection_hits": [],
                "stats": {},
            }
  • Pydantic schema defining the output structure for the explain_search tool response.
    class ExplainSearchResponse(BaseModel):
        """详细搜索解释响应"""
        query: str
        k: int
        alpha: float
        per_doc_limit: int | None
        fts_topn: int
        vec_topn: int
        final_results: list[SearchResult]
        fts_only_hits: list[SearchResult]
        vec_only_hits: list[SearchResult]
        intersection_hits: list[SearchResult]
        stats: dict[str, Any]
  • Core helper function implementing the detailed hybrid search logic with FTS and vector search explanations, categories hits, computes stats.
    async def explain_hybrid_search(
        query: str,
        k: int = 10,
        alpha: float = 0.6,
        fts_topn: int = 50,
        vec_topn: int = 50,
        per_doc_limit: int | None = None,
    ) -> ExplainSearchResponse:
        """详细的混合搜索(带解释)- 异步并行版"""
        
        # 并行执行 FTS 和 Embedding
        fts_task = asyncio.to_thread(search_fts, query, fts_topn)
        emb_task = aget_embeddings_batch([query])
        
        fts_results, embeddings = await asyncio.gather(fts_task, emb_task)
        query_embedding = embeddings[0]
        fts_chunk_ids = {r["chunk_id"] for r in fts_results}
        
        # 执行向量搜索
        vec_results = await asyncio.to_thread(search_vector, query_embedding, vec_topn)
        vec_chunk_ids = {r["chunk_id"] for r in vec_results}
        
        # 3. 计算交集和差集
        intersection_ids = fts_chunk_ids & vec_chunk_ids
        fts_only_ids = fts_chunk_ids - vec_chunk_ids
        vec_only_ids = vec_chunk_ids - fts_chunk_ids
        
        # 4. 合并结果
        all_chunks: dict[int, dict[str, Any]] = {}
        
        # 处理 FTS 结果
        if fts_results:
            max_rank = max(r["rank"] for r in fts_results) or 1.0
            for r in fts_results:
                chunk_id = r["chunk_id"]
                fts_score = r["rank"] / max_rank
                all_chunks[chunk_id] = {
                    "chunk_id": chunk_id,
                    "doc_id": r["doc_id"],
                    "page_start": r["page_start"],
                    "page_end": r["page_end"],
                    "text": r["text"],
                    "score_fts": fts_score,
                    "score_vec": None,
                }
        
        # 处理向量结果
        if vec_results:
            for r in vec_results:
                chunk_id = r["chunk_id"]
                vec_score = 1.0 - r["distance"]
                
                if chunk_id in all_chunks:
                    all_chunks[chunk_id]["score_vec"] = vec_score
                else:
                    all_chunks[chunk_id] = {
                        "chunk_id": chunk_id,
                        "doc_id": r["doc_id"],
                        "page_start": r["page_start"],
                        "page_end": r["page_end"],
                        "text": r["text"],
                        "score_fts": None,
                        "score_vec": vec_score,
                    }
        
        # 5. 创建所有结果(带综合分数)
        def make_result(chunk_data: dict[str, Any]) -> SearchResult:
            fts_score = chunk_data["score_fts"] or 0.0
            vec_score = chunk_data["score_vec"] or 0.0
            total_score = alpha * vec_score + (1 - alpha) * fts_score
            text = chunk_data["text"]
            snippet = text[:200] + "..." if len(text) > 200 else text
            
            return SearchResult(
                chunk_id=chunk_data["chunk_id"],
                doc_id=chunk_data["doc_id"],
                page_start=chunk_data["page_start"],
                page_end=chunk_data["page_end"],
                snippet=snippet,
                score_total=total_score,
                score_vec=chunk_data["score_vec"],
                score_fts=chunk_data["score_fts"],
            )
        
        # 创建各类结果列表
        all_results = [make_result(all_chunks[cid]) for cid in all_chunks]
        fts_only_hits = [make_result(all_chunks[cid]) for cid in fts_only_ids]
        vec_only_hits = [make_result(all_chunks[cid]) for cid in vec_only_ids]
        intersection_hits = [make_result(all_chunks[cid]) for cid in intersection_ids]
        
        # 排序
        all_results.sort(key=lambda x: x.score_total, reverse=True)
        fts_only_hits.sort(key=lambda x: x.score_fts or 0, reverse=True)
        vec_only_hits.sort(key=lambda x: x.score_vec or 0, reverse=True)
        intersection_hits.sort(key=lambda x: x.score_total, reverse=True)
        
        # 应用每文档限制
        final_results = apply_per_doc_limit(all_results, per_doc_limit)[:k]
        
        # 统计
        stats = {
            "total_candidates": len(all_chunks),
            "fts_candidates": len(fts_results),
            "vec_candidates": len(vec_results),
            "fts_only_count": len(fts_only_ids),
            "vec_only_count": len(vec_only_ids),
            "intersection_count": len(intersection_ids),
            "unique_docs_in_final": len(set(r.doc_id for r in final_results)),
        }
        
        return ExplainSearchResponse(
            query=query,
            k=k,
            alpha=alpha,
            per_doc_limit=per_doc_limit,
            fts_topn=fts_topn,
            vec_topn=vec_topn,
            final_results=final_results,
            fts_only_hits=fts_only_hits[:10],  # 只返回前 10 个
            vec_only_hits=vec_only_hits[:10],
            intersection_hits=intersection_hits[:10],
            stats=stats,
        )
  • Top-level registration call that invokes the function to register the explain_search tool (and other search tools) to the MCP server instance.
    register_search_tools(mcp)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It describes what the tool returns (detailed explanation with multiple result categories) but doesn't mention performance characteristics, rate limits, authentication requirements, or potential side effects. The description is accurate about the tool's behavior but lacks operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, usage, Args, Returns). It's appropriately sized for a complex tool with 6 parameters. Every sentence earns its place, though the Chinese-to-English translation creates some redundancy in the opening statement.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, hybrid search functionality) and the presence of an output schema (Returns section), the description is complete enough. It explains the tool's specialized debugging purpose, documents all parameters, and outlines the return structure. The output schema in the Returns section means the description doesn't need to fully document return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate for the schema's lack of parameter documentation. The description provides clear explanations for all 6 parameters in the Args section, including their purposes and default values. This adds significant value beyond the bare schema, though it doesn't explain parameter interactions or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '执行混合搜索并返回详细的解释信息' (perform hybrid search and return detailed explanation information). It specifies the exact verb ('执行混合搜索' - perform hybrid search) and resource ('搜索结果' - search results), and distinguishes itself from siblings like search_fts_only and search_hybrid by emphasizing debugging and optimization functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: '用于调试和优化搜索参数' (for debugging and optimizing search parameters). This provides clear context that distinguishes it from regular search tools like search_hybrid, which would be for production use rather than debugging.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server