Skip to main content
Glama

get_document

Retrieve metadata and statistics for academic papers in Paperlib MCP. Get document details including title, authors, publication info, chunk counts, and storage locations by providing the document ID.

Instructions

获取指定文档的元数据和统计信息

根据 doc_id 获取文档的完整元数据,包括标题、作者、chunk 数量等。

Args: doc_id: 文档的唯一标识符(SHA256 哈希)

Returns: 文档的详细信息,包含: - 元数据:title, authors, year, venue, doi, url - 存储信息:pdf_bucket, pdf_key - 统计:chunk_count, embedded_chunk_count, total_tokens

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doc_idYes

Implementation Reference

  • The primary handler function for the 'get_document' tool. It queries the database for the document's metadata and chunk statistics, validates with DocumentDetail model, and returns the result or error.
    @mcp.tool() def get_document(doc_id: str) -> dict[str, Any]: """获取指定文档的元数据和统计信息 根据 doc_id 获取文档的完整元数据,包括标题、作者、chunk 数量等。 Args: doc_id: 文档的唯一标识符(SHA256 哈希) Returns: 文档的详细信息,包含: - 元数据:title, authors, year, venue, doi, url - 存储信息:pdf_bucket, pdf_key - 统计:chunk_count, embedded_chunk_count, total_tokens """ try: # 查询文档基本信息 doc = query_one( """ SELECT doc_id, title, authors, year, venue, doi, url, pdf_bucket, pdf_key, pdf_sha256, created_at::text, updated_at::text FROM documents WHERE doc_id = %s """, (doc_id,) ) if not doc: return { "error": f"Document not found: {doc_id}", "doc_id": doc_id, } # 查询统计信息 stats = query_one( """ SELECT COUNT(c.chunk_id) as chunk_count, COUNT(ce.chunk_id) as embedded_chunk_count, COALESCE(SUM(c.token_count), 0) as total_tokens FROM chunks c LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id WHERE c.doc_id = %s """, (doc_id,) ) return DocumentDetail( doc_id=doc["doc_id"], title=doc["title"], authors=doc["authors"], year=doc["year"], venue=doc["venue"], doi=doc["doi"], url=doc["url"], pdf_bucket=doc["pdf_bucket"], pdf_key=doc["pdf_key"], pdf_sha256=doc["pdf_sha256"], created_at=doc["created_at"], updated_at=doc["updated_at"], chunk_count=stats["chunk_count"] if stats else 0, embedded_chunk_count=stats["embedded_chunk_count"] if stats else 0, total_tokens=stats["total_tokens"] if stats else 0, ).model_dump() except Exception as e: return { "error": str(e), "doc_id": doc_id, }
  • Pydantic BaseModel defining the output schema for the get_document tool response.
    class DocumentDetail(BaseModel): """文档详细信息""" doc_id: str title: str | None authors: str | None year: int | None venue: str | None doi: str | None url: str | None pdf_bucket: str pdf_key: str pdf_sha256: str | None created_at: str | None updated_at: str | None # 统计信息 chunk_count: int embedded_chunk_count: int total_tokens: int
  • Registration of the fetch tools module, which includes the get_document tool, by calling register_fetch_tools on the MCP instance.
    register_fetch_tools(mcp)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server