Skip to main content
Glama

build_evidence_pack

Search academic literature for topic-related excerpts and save them as reusable evidence packages to maintain consistent references across multiple review iterations.

Instructions

构建证据包

搜索与主题相关的文献片段,并保存为可复用的证据包。 证据包可用于多次迭代综述写作,避免每次重新检索导致结果漂移。

Args: query: 搜索主题/研究问题 k: 检索数量,默认 40 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 alpha: 向量搜索权重,默认 0.6

Returns: 证据包信息,包含 pack_id 和检索到的条目

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
kNo
per_doc_limitNo
alphaNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The main handler function for the 'build_evidence_pack' tool. It performs hybrid search on literature chunks, saves the results as an evidence pack in the database (evidence_packs and evidence_pack_items tables), fetches document metadata, and returns structured pack information including items with snippets and scores.
    async def build_evidence_pack(
        query: str,
        k: int = 40,
        per_doc_limit: int = 3,
        alpha: float = 0.6,
    ) -> dict[str, Any]:
        """构建证据包
        
        搜索与主题相关的文献片段,并保存为可复用的证据包。
        证据包可用于多次迭代综述写作,避免每次重新检索导致结果漂移。
        
        Args:
            query: 搜索主题/研究问题
            k: 检索数量,默认 40
            per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3
            alpha: 向量搜索权重,默认 0.6
            
        Returns:
            证据包信息,包含 pack_id 和检索到的条目
        """
        try:
            # 执行搜索
            search_result = await hybrid_search(
                query=query,
                k=k,
                alpha=alpha,
                per_doc_limit=per_doc_limit,
            )
            
            if not search_result.results:
                return {
                    "error": "No relevant literature found",
                    "query": query,
                    "pack_id": None,
                }
            
            # 保存证据包
            params = {
                "k": k,
                "per_doc_limit": per_doc_limit,
                "alpha": alpha,
            }
            
            with get_db() as conn:
                with conn.cursor() as cur:
                    # 创建证据包
                    cur.execute(
                        """
                        INSERT INTO evidence_packs (query, params_json)
                        VALUES (%s, %s)
                        RETURNING pack_id
                        """,
                        (query, json.dumps(params))
                    )
                    pack_result = cur.fetchone()
                    pack_id = pack_result["pack_id"]
                    
                    # 插入条目
                    for rank, result in enumerate(search_result.results):
                        cur.execute(
                            """
                            INSERT INTO evidence_pack_items (pack_id, doc_id, chunk_id, rank)
                            VALUES (%s, %s, %s, %s)
                            """,
                            (pack_id, result.doc_id, result.chunk_id, rank)
                        )
            
            # 获取文档元数据
            doc_ids = list(set(r.doc_id for r in search_result.results))
            doc_metadata = {}
            for doc_id in doc_ids:
                doc = query_one(
                    "SELECT title, authors, year FROM documents WHERE doc_id = %s",
                    (doc_id,)
                )
                if doc:
                    doc_metadata[doc_id] = doc
            
            # 构建返回结果
            items = []
            for result in search_result.results:
                meta = doc_metadata.get(result.doc_id, {})
                items.append({
                    "doc_id": result.doc_id,
                    "chunk_id": result.chunk_id,
                    "page_start": result.page_start,
                    "page_end": result.page_end,
                    "text": result.snippet,
                    "score": result.score_total,
                    "title": meta.get("title"),
                    "authors": meta.get("authors"),
                    "year": meta.get("year"),
                })
            
            return {
                "pack_id": pack_id,
                "query": query,
                "params": params,
                "items": items,
                "stats": {
                    "total_chunks": len(items),
                    "unique_docs": len(doc_ids),
                },
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "query": query,
                "pack_id": None,
            }
  • The call to register_writing_tools(mcp) which registers the build_evidence_pack tool (and other writing tools) to the FastMCP server instance.
    register_writing_tools(mcp)
  • Pydantic model defining the structure of an EvidencePack, which matches the return type of the build_evidence_pack tool.
    class EvidencePack(BaseModel):
        """证据包"""
        pack_id: int
        query: str
        params: dict[str, Any]
        items: list[EvidencePackItem]
        stats: dict[str, Any]
  • Pydantic model for individual items in the EvidencePack returned by the tool.
    class EvidencePackItem(BaseModel):
        """证据包条目"""
        doc_id: str
        chunk_id: int
        page_start: int
        page_end: int
        text: str
        score: float
  • Helper function to retrieve an evidence pack from the database by pack_id, used in related tools but illustrative of the data structure.
    def get_evidence_pack(pack_id: int) -> EvidencePack | None:
        """获取证据包内容
        
        Args:
            pack_id: 证据包 ID
            
        Returns:
            证据包对象,如果不存在返回 None
        """
        # 获取证据包元数据
        pack = query_one(
            """
            SELECT pack_id, query, params_json, created_at::text
            FROM evidence_packs
            WHERE pack_id = %s
            """,
            (pack_id,)
        )
        
        if not pack:
            return None
        
        # 获取证据包条目
        items = query_all(
            """
            SELECT 
                epi.doc_id,
                epi.chunk_id,
                epi.rank,
                c.page_start,
                c.page_end,
                c.text
            FROM evidence_pack_items epi
            JOIN chunks c ON epi.chunk_id = c.chunk_id
            WHERE epi.pack_id = %s
            ORDER BY epi.rank
            """,
            (pack_id,)
        )
        
        # 统计
        unique_docs = len(set(item["doc_id"] for item in items))
        
        return EvidencePack(
            pack_id=pack["pack_id"],
            query=pack["query"],
            params=pack["params_json"] or {},
            items=[
                EvidencePackItem(
                    doc_id=item["doc_id"],
                    chunk_id=item["chunk_id"],
                    page_start=item["page_start"],
                    page_end=item["page_end"],
                    text=item["text"],
                    score=1.0 / (item["rank"] + 1) if item["rank"] is not None else 0.5,  # 基于排名的伪分数
                )
                for item in items
            ],
            stats={
                "total_chunks": len(items),
                "unique_docs": unique_docs,
            }
        )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool's core behavior (searching and saving evidence packs) and mentions the benefit of avoiding '结果漂移' (result drift). However, it doesn't disclose important behavioral traits like whether this is a read-only or write operation, what permissions might be required, whether the evidence pack creation is reversible, or any rate limits. The description adds some context but leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured. It begins with a clear purpose statement, follows with usage context, then provides parameter explanations, and ends with return value information. Each sentence earns its place, though the parameter explanations could be slightly more concise. The information is front-loaded with the most important details first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, evidence pack creation), no annotations, and the presence of an output schema (implied by 'Returns' section), the description is reasonably complete. It covers purpose, usage context, parameter semantics, and return values. The main gap is the lack of behavioral transparency details that would be important for a tool that creates persistent evidence packs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides semantic explanations for all 4 parameters in the 'Args' section, adding meaning beyond what the 0% coverage schema provides. It explains that 'query' is the search topic/research question, 'k' is the retrieval count, 'per_doc_limit' is the maximum chunks per document, and 'alpha' is the vector search weight. This compensates well for the lack of schema descriptions, though it doesn't explain parameter constraints or ranges.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '搜索与主题相关的文献片段,并保存为可复用的证据包' (search for literature fragments related to a topic and save them as reusable evidence packs). It specifies the verb ('搜索并保存' - search and save) and resource ('文献片段' - literature fragments/evidence packs). However, it doesn't explicitly differentiate from sibling tools like 'collect_evidence' or 'build_community_evidence_pack', which appears to be a related functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: '用于多次迭代综述写作,避免每次重新检索导致结果漂移' (for multiple iterations of review writing, to avoid result drift from repeated retrieval). This explains the tool's value proposition. However, it doesn't explicitly state when NOT to use it or mention alternatives like 'collect_evidence' or other evidence-related sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server