get_document
Retrieve metadata and statistics for academic papers in Paperlib MCP. Get document details including title, authors, publication info, chunk counts, and storage locations by providing the document ID.
Instructions
获取指定文档的元数据和统计信息
根据 doc_id 获取文档的完整元数据,包括标题、作者、chunk 数量等。
Args: doc_id: 文档的唯一标识符(SHA256 哈希)
Returns: 文档的详细信息,包含: - 元数据:title, authors, year, venue, doi, url - 存储信息:pdf_bucket, pdf_key - 统计:chunk_count, embedded_chunk_count, total_tokens
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| doc_id | Yes |
Implementation Reference
- src/paperlib_mcp/tools/fetch.py:117-188 (handler)The primary handler function for the 'get_document' tool. It queries the database for the document's metadata and chunk statistics, validates with DocumentDetail model, and returns the result or error.@mcp.tool() def get_document(doc_id: str) -> dict[str, Any]: """获取指定文档的元数据和统计信息 根据 doc_id 获取文档的完整元数据,包括标题、作者、chunk 数量等。 Args: doc_id: 文档的唯一标识符(SHA256 哈希) Returns: 文档的详细信息,包含: - 元数据:title, authors, year, venue, doi, url - 存储信息:pdf_bucket, pdf_key - 统计:chunk_count, embedded_chunk_count, total_tokens """ try: # 查询文档基本信息 doc = query_one( """ SELECT doc_id, title, authors, year, venue, doi, url, pdf_bucket, pdf_key, pdf_sha256, created_at::text, updated_at::text FROM documents WHERE doc_id = %s """, (doc_id,) ) if not doc: return { "error": f"Document not found: {doc_id}", "doc_id": doc_id, } # 查询统计信息 stats = query_one( """ SELECT COUNT(c.chunk_id) as chunk_count, COUNT(ce.chunk_id) as embedded_chunk_count, COALESCE(SUM(c.token_count), 0) as total_tokens FROM chunks c LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id WHERE c.doc_id = %s """, (doc_id,) ) return DocumentDetail( doc_id=doc["doc_id"], title=doc["title"], authors=doc["authors"], year=doc["year"], venue=doc["venue"], doi=doc["doi"], url=doc["url"], pdf_bucket=doc["pdf_bucket"], pdf_key=doc["pdf_key"], pdf_sha256=doc["pdf_sha256"], created_at=doc["created_at"], updated_at=doc["updated_at"], chunk_count=stats["chunk_count"] if stats else 0, embedded_chunk_count=stats["embedded_chunk_count"] if stats else 0, total_tokens=stats["total_tokens"] if stats else 0, ).model_dump() except Exception as e: return { "error": str(e), "doc_id": doc_id, }
- Pydantic BaseModel defining the output schema for the get_document tool response.class DocumentDetail(BaseModel): """文档详细信息""" doc_id: str title: str | None authors: str | None year: int | None venue: str | None doi: str | None url: str | None pdf_bucket: str pdf_key: str pdf_sha256: str | None created_at: str | None updated_at: str | None # 统计信息 chunk_count: int embedded_chunk_count: int total_tokens: int
- src/paperlib_mcp/server.py:36-36 (registration)Registration of the fetch tools module, which includes the get_document tool, by calling register_fetch_tools on the MCP instance.register_fetch_tools(mcp)