Skip to main content
Glama

get_document_chunks

Retrieve all text chunks from a document to analyze content structure. Provides chunk IDs, page numbers, and text summaries for comprehensive document examination.

Instructions

获取指定文档的所有 chunks 列表

根据 doc_id 获取该文档的所有文本块摘要信息。

Args: doc_id: 文档的唯一标识符

Returns: chunks 列表,每个包含 chunk_id、页码和文本摘要

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
doc_idYes

Implementation Reference

  • The handler function for the 'get_document_chunks' tool. It retrieves all chunks for a given document ID from the database, providing summaries including chunk_id, index, pages, token count, snippet, and embedding status.
    @mcp.tool()
    def get_document_chunks(doc_id: str) -> dict[str, Any]:
        """获取指定文档的所有 chunks 列表
        
        根据 doc_id 获取该文档的所有文本块摘要信息。
        
        Args:
            doc_id: 文档的唯一标识符
            
        Returns:
            chunks 列表,每个包含 chunk_id、页码和文本摘要
        """
        try:
            chunks = query_all(
                """
                SELECT 
                    c.chunk_id,
                    c.chunk_index,
                    c.page_start,
                    c.page_end,
                    c.token_count,
                    LEFT(c.text, 100) as snippet,
                    CASE WHEN ce.chunk_id IS NOT NULL THEN true ELSE false END as has_embedding
                FROM chunks c
                LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
                WHERE c.doc_id = %s
                ORDER BY c.chunk_index
                """,
                (doc_id,)
            )
            
            return {
                "doc_id": doc_id,
                "chunk_count": len(chunks),
                "chunks": [
                    {
                        "chunk_id": c["chunk_id"],
                        "chunk_index": c["chunk_index"],
                        "page_start": c["page_start"],
                        "page_end": c["page_end"],
                        "token_count": c["token_count"],
                        "snippet": c["snippet"] + "..." if len(c["snippet"]) >= 100 else c["snippet"],
                        "has_embedding": c["has_embedding"],
                    }
                    for c in chunks
                ],
            }
            
        except Exception as e:
            return {
                "error": str(e),
                "doc_id": doc_id,
                "chunks": [],
            }
  • Registers the fetch tools module, which includes the get_document_chunks tool, on the MCP server instance.
    register_fetch_tools(mcp)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server