build_evidence_pack
Search academic literature for topic-related excerpts and save them as reusable evidence packages to maintain consistent references across multiple review iterations.
Instructions
构建证据包
搜索与主题相关的文献片段,并保存为可复用的证据包。 证据包可用于多次迭代综述写作,避免每次重新检索导致结果漂移。
Args: query: 搜索主题/研究问题 k: 检索数量,默认 40 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 alpha: 向量搜索权重,默认 0.6
Returns: 证据包信息,包含 pack_id 和检索到的条目
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| k | No | ||
| per_doc_limit | No | ||
| alpha | No |
Implementation Reference
- The main handler function for the 'build_evidence_pack' tool. It performs hybrid search on literature chunks, saves the results as an evidence pack in the database (evidence_packs and evidence_pack_items tables), fetches document metadata, and returns structured pack information including items with snippets and scores.async def build_evidence_pack( query: str, k: int = 40, per_doc_limit: int = 3, alpha: float = 0.6, ) -> dict[str, Any]: """构建证据包 搜索与主题相关的文献片段,并保存为可复用的证据包。 证据包可用于多次迭代综述写作,避免每次重新检索导致结果漂移。 Args: query: 搜索主题/研究问题 k: 检索数量,默认 40 per_doc_limit: 每篇文档最多返回的 chunk 数量,默认 3 alpha: 向量搜索权重,默认 0.6 Returns: 证据包信息,包含 pack_id 和检索到的条目 """ try: # 执行搜索 search_result = await hybrid_search( query=query, k=k, alpha=alpha, per_doc_limit=per_doc_limit, ) if not search_result.results: return { "error": "No relevant literature found", "query": query, "pack_id": None, } # 保存证据包 params = { "k": k, "per_doc_limit": per_doc_limit, "alpha": alpha, } with get_db() as conn: with conn.cursor() as cur: # 创建证据包 cur.execute( """ INSERT INTO evidence_packs (query, params_json) VALUES (%s, %s) RETURNING pack_id """, (query, json.dumps(params)) ) pack_result = cur.fetchone() pack_id = pack_result["pack_id"] # 插入条目 for rank, result in enumerate(search_result.results): cur.execute( """ INSERT INTO evidence_pack_items (pack_id, doc_id, chunk_id, rank) VALUES (%s, %s, %s, %s) """, (pack_id, result.doc_id, result.chunk_id, rank) ) # 获取文档元数据 doc_ids = list(set(r.doc_id for r in search_result.results)) doc_metadata = {} for doc_id in doc_ids: doc = query_one( "SELECT title, authors, year FROM documents WHERE doc_id = %s", (doc_id,) ) if doc: doc_metadata[doc_id] = doc # 构建返回结果 items = [] for result in search_result.results: meta = doc_metadata.get(result.doc_id, {}) items.append({ "doc_id": result.doc_id, "chunk_id": result.chunk_id, "page_start": result.page_start, "page_end": result.page_end, "text": result.snippet, "score": result.score_total, "title": meta.get("title"), "authors": meta.get("authors"), "year": meta.get("year"), }) return { "pack_id": pack_id, "query": query, "params": params, "items": items, "stats": { "total_chunks": len(items), "unique_docs": len(doc_ids), }, } except Exception as e: return { "error": str(e), "query": query, "pack_id": None, }
- src/paperlib_mcp/server.py:37-37 (registration)The call to register_writing_tools(mcp) which registers the build_evidence_pack tool (and other writing tools) to the FastMCP server instance.register_writing_tools(mcp)
- Pydantic model defining the structure of an EvidencePack, which matches the return type of the build_evidence_pack tool.class EvidencePack(BaseModel): """证据包""" pack_id: int query: str params: dict[str, Any] items: list[EvidencePackItem] stats: dict[str, Any]
- Pydantic model for individual items in the EvidencePack returned by the tool.class EvidencePackItem(BaseModel): """证据包条目""" doc_id: str chunk_id: int page_start: int page_end: int text: str score: float
- Helper function to retrieve an evidence pack from the database by pack_id, used in related tools but illustrative of the data structure.def get_evidence_pack(pack_id: int) -> EvidencePack | None: """获取证据包内容 Args: pack_id: 证据包 ID Returns: 证据包对象,如果不存在返回 None """ # 获取证据包元数据 pack = query_one( """ SELECT pack_id, query, params_json, created_at::text FROM evidence_packs WHERE pack_id = %s """, (pack_id,) ) if not pack: return None # 获取证据包条目 items = query_all( """ SELECT epi.doc_id, epi.chunk_id, epi.rank, c.page_start, c.page_end, c.text FROM evidence_pack_items epi JOIN chunks c ON epi.chunk_id = c.chunk_id WHERE epi.pack_id = %s ORDER BY epi.rank """, (pack_id,) ) # 统计 unique_docs = len(set(item["doc_id"] for item in items)) return EvidencePack( pack_id=pack["pack_id"], query=pack["query"], params=pack["params_json"] or {}, items=[ EvidencePackItem( doc_id=item["doc_id"], chunk_id=item["chunk_id"], page_start=item["page_start"], page_end=item["page_end"], text=item["text"], score=1.0 / (item["rank"] + 1) if item["rank"] is not None else 0.5, # 基于排名的伪分数 ) for item in items ], stats={ "total_chunks": len(items), "unique_docs": unique_docs, } )