Skip to main content
Glama

build_evidence_pack

Search academic literature for topic-related excerpts and save them as reusable evidence packages to maintain consistent references across multiple review iterations.

Instructions

构建证据包

搜索与主题相关的文献片段,并保存为可复用的证据包。 证据包可用于多次迭代综述写作,避免每次重新检索导致结果漂移。

Args: query: 搜索主题/研究问题 k: 检索数量,默认 40 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 alpha: 向量搜索权重,默认 0.6

Returns: 证据包信息,包含 pack_id 和检索到的条目

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
kNo
per_doc_limitNo
alphaNo

Implementation Reference

  • The main handler function for the 'build_evidence_pack' tool. It performs hybrid search on literature chunks, saves the results as an evidence pack in the database (evidence_packs and evidence_pack_items tables), fetches document metadata, and returns structured pack information including items with snippets and scores.
    async def build_evidence_pack(
        query: str,
        k: int = 40,
        per_doc_limit: int = 3,
        alpha: float = 0.6,
    ) -> dict[str, Any]:
        """构建证据包
        
        搜索与主题相关的文献片段,并保存为可复用的证据包。
        证据包可用于多次迭代综述写作,避免每次重新检索导致结果漂移。
        
        Args:
            query: 搜索主题/研究问题
            k: 检索数量,默认 40
            per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3
            alpha: 向量搜索权重,默认 0.6
            
        Returns:
            证据包信息,包含 pack_id 和检索到的条目
        """
        try:
            # 执行搜索
            search_result = await hybrid_search(
                query=query,
                k=k,
                alpha=alpha,
                per_doc_limit=per_doc_limit,
            )
            
            if not search_result.results:
                return {
                    "error": "No relevant literature found",
                    "query": query,
                    "pack_id": None,
                }
            
            # 保存证据包
            params = {
                "k": k,
                "per_doc_limit": per_doc_limit,
                "alpha": alpha,
            }
            
            with get_db() as conn:
                with conn.cursor() as cur:
                    # 创建证据包
                    cur.execute(
                        """
                        INSERT INTO evidence_packs (query, params_json)
                        VALUES (%s, %s)
                        RETURNING pack_id
                        """,
                        (query, json.dumps(params))
                    )
                    pack_result = cur.fetchone()
                    pack_id = pack_result["pack_id"]
                    
                    # 插入条目
                    for rank, result in enumerate(search_result.results):
                        cur.execute(
                            """
                            INSERT INTO evidence_pack_items (pack_id, doc_id, chunk_id, rank)
                            VALUES (%s, %s, %s, %s)
                            """,
                            (pack_id, result.doc_id, result.chunk_id, rank)
                        )
            
            # 获取文档元数据
            doc_ids = list(set(r.doc_id for r in search_result.results))
            doc_metadata = {}
            for doc_id in doc_ids:
                doc = query_one(
                    "SELECT title, authors, year FROM documents WHERE doc_id = %s",
                    (doc_id,)
                )
                if doc:
                    doc_metadata[doc_id] = doc
            
            # 构建返回结果
            items = []
            for result in search_result.results:
                meta = doc_metadata.get(result.doc_id, {})
                items.append({
                    "doc_id": result.doc_id,
                    "chunk_id": result.chunk_id,
                    "page_start": result.page_start,
                    "page_end": result.page_end,
                    "text": result.snippet,
                    "score": result.score_total,
                    "title": meta.get("title"),
                    "authors": meta.get("authors"),
                    "year": meta.get("year"),
                })
            
            return {
                "pack_id": pack_id,
                "query": query,
                "params": params,
                "items": items,
                "stats": {
                    "total_chunks": len(items),
                    "unique_docs": len(doc_ids),
                },
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "query": query,
                "pack_id": None,
            }
  • The call to register_writing_tools(mcp) which registers the build_evidence_pack tool (and other writing tools) to the FastMCP server instance.
    register_writing_tools(mcp)
  • Pydantic model defining the structure of an EvidencePack, which matches the return type of the build_evidence_pack tool.
    class EvidencePack(BaseModel):
        """证据包"""
        pack_id: int
        query: str
        params: dict[str, Any]
        items: list[EvidencePackItem]
        stats: dict[str, Any]
  • Pydantic model for individual items in the EvidencePack returned by the tool.
    class EvidencePackItem(BaseModel):
        """证据包条目"""
        doc_id: str
        chunk_id: int
        page_start: int
        page_end: int
        text: str
        score: float
  • Helper function to retrieve an evidence pack from the database by pack_id, used in related tools but illustrative of the data structure.
    def get_evidence_pack(pack_id: int) -> EvidencePack | None:
        """获取证据包内容
        
        Args:
            pack_id: 证据包 ID
            
        Returns:
            证据包对象,如果不存在返回 None
        """
        # 获取证据包元数据
        pack = query_one(
            """
            SELECT pack_id, query, params_json, created_at::text
            FROM evidence_packs
            WHERE pack_id = %s
            """,
            (pack_id,)
        )
        
        if not pack:
            return None
        
        # 获取证据包条目
        items = query_all(
            """
            SELECT 
                epi.doc_id,
                epi.chunk_id,
                epi.rank,
                c.page_start,
                c.page_end,
                c.text
            FROM evidence_pack_items epi
            JOIN chunks c ON epi.chunk_id = c.chunk_id
            WHERE epi.pack_id = %s
            ORDER BY epi.rank
            """,
            (pack_id,)
        )
        
        # 统计
        unique_docs = len(set(item["doc_id"] for item in items))
        
        return EvidencePack(
            pack_id=pack["pack_id"],
            query=pack["query"],
            params=pack["params_json"] or {},
            items=[
                EvidencePackItem(
                    doc_id=item["doc_id"],
                    chunk_id=item["chunk_id"],
                    page_start=item["page_start"],
                    page_end=item["page_end"],
                    text=item["text"],
                    score=1.0 / (item["rank"] + 1) if item["rank"] is not None else 0.5,  # 基于排名的伪分数
                )
                for item in items
            ],
            stats={
                "total_chunks": len(items),
                "unique_docs": unique_docs,
            }
        )

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server