Skip to main content
Glama

get_document

Retrieve metadata and statistics for academic papers in Paperlib MCP. Get document details including title, authors, publication info, chunk counts, and storage locations by providing the document ID.

Instructions

获取指定文档的元数据和统计信息

根据 doc_id 获取文档的完整元数据,包括标题、作者、chunk 数量等。

Args: doc_id: 文档的唯一标识符(SHA256 哈希)

Returns: 文档的详细信息,包含: - 元数据:title, authors, year, venue, doi, url - 存储信息:pdf_bucket, pdf_key - 统计:chunk_count, embedded_chunk_count, total_tokens

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doc_idYes

Implementation Reference

  • The primary handler function for the 'get_document' tool. It queries the database for the document's metadata and chunk statistics, validates with DocumentDetail model, and returns the result or error.
    @mcp.tool()
    def get_document(doc_id: str) -> dict[str, Any]:
        """获取指定文档的元数据和统计信息
        
        根据 doc_id 获取文档的完整元数据,包括标题、作者、chunk 数量等。
        
        Args:
            doc_id: 文档的唯一标识符(SHA256 哈希)
            
        Returns:
            文档的详细信息,包含:
            - 元数据:title, authors, year, venue, doi, url
            - 存储信息:pdf_bucket, pdf_key
            - 统计:chunk_count, embedded_chunk_count, total_tokens
        """
        try:
            # 查询文档基本信息
            doc = query_one(
                """
                SELECT 
                    doc_id, title, authors, year, venue, doi, url,
                    pdf_bucket, pdf_key, pdf_sha256,
                    created_at::text, updated_at::text
                FROM documents
                WHERE doc_id = %s
                """,
                (doc_id,)
            )
            
            if not doc:
                return {
                    "error": f"Document not found: {doc_id}",
                    "doc_id": doc_id,
                }
            
            # 查询统计信息
            stats = query_one(
                """
                SELECT 
                    COUNT(c.chunk_id) as chunk_count,
                    COUNT(ce.chunk_id) as embedded_chunk_count,
                    COALESCE(SUM(c.token_count), 0) as total_tokens
                FROM chunks c
                LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
                WHERE c.doc_id = %s
                """,
                (doc_id,)
            )
            
            return DocumentDetail(
                doc_id=doc["doc_id"],
                title=doc["title"],
                authors=doc["authors"],
                year=doc["year"],
                venue=doc["venue"],
                doi=doc["doi"],
                url=doc["url"],
                pdf_bucket=doc["pdf_bucket"],
                pdf_key=doc["pdf_key"],
                pdf_sha256=doc["pdf_sha256"],
                created_at=doc["created_at"],
                updated_at=doc["updated_at"],
                chunk_count=stats["chunk_count"] if stats else 0,
                embedded_chunk_count=stats["embedded_chunk_count"] if stats else 0,
                total_tokens=stats["total_tokens"] if stats else 0,
            ).model_dump()
            
        except Exception as e:
            return {
                "error": str(e),
                "doc_id": doc_id,
            }
  • Pydantic BaseModel defining the output schema for the get_document tool response.
    class DocumentDetail(BaseModel):
        """文档详细信息"""
        doc_id: str
        title: str | None
        authors: str | None
        year: int | None
        venue: str | None
        doi: str | None
        url: str | None
        pdf_bucket: str
        pdf_key: str
        pdf_sha256: str | None
        created_at: str | None
        updated_at: str | None
        # 统计信息
        chunk_count: int
        embedded_chunk_count: int
        total_tokens: int
  • Registration of the fetch tools module, which includes the get_document tool, by calling register_fetch_tools on the MCP instance.
    register_fetch_tools(mcp)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server