Skip to main content
Glama

get_chunk

Retrieve complete text content and metadata for a specific text chunk by providing its unique identifier. Get full text, page ranges, and document information to access segmented academic content.

Instructions

获取指定 chunk 的完整内容

根据 chunk_id 获取文本块的完整信息,包括全文、页码、所属文档等。

Args: chunk_id: chunk 的唯一标识符

Returns: chunk 的详细信息,包含: - chunk_id: chunk ID - doc_id: 所属文档 ID - text: 完整文本 - page_start/page_end: 页码范围 - has_embedding: 是否有 embedding

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
chunk_idYes

Implementation Reference

  • The handler function for the 'get_chunk' tool, decorated with @mcp.tool() for automatic registration. It retrieves chunk details from the database using a SQL query and returns a structured dictionary via the ChunkDetail Pydantic model.
    @mcp.tool()
    def get_chunk(chunk_id: int) -> dict[str, Any]:
        """获取指定 chunk 的完整内容
        
        根据 chunk_id 获取文本块的完整信息,包括全文、页码、所属文档等。
        
        Args:
            chunk_id: chunk 的唯一标识符
            
        Returns:
            chunk 的详细信息,包含:
            - chunk_id: chunk ID
            - doc_id: 所属文档 ID
            - text: 完整文本
            - page_start/page_end: 页码范围
            - has_embedding: 是否有 embedding
        """
        try:
            # 查询 chunk 信息
            chunk = query_one(
                """
                SELECT 
                    c.chunk_id,
                    c.doc_id,
                    c.chunk_index,
                    c.section,
                    c.page_start,
                    c.page_end,
                    c.text,
                    c.token_count,
                    CASE WHEN ce.chunk_id IS NOT NULL THEN true ELSE false END as has_embedding
                FROM chunks c
                LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id
                WHERE c.chunk_id = %s
                """,
                (chunk_id,)
            )
            
            if not chunk:
                return {
                    "error": f"Chunk not found: {chunk_id}",
                    "chunk_id": chunk_id,
                }
            
            return ChunkDetail(
                chunk_id=chunk["chunk_id"],
                doc_id=chunk["doc_id"],
                chunk_index=chunk["chunk_index"],
                section=chunk["section"],
                page_start=chunk["page_start"],
                page_end=chunk["page_end"],
                text=chunk["text"],
                token_count=chunk["token_count"],
                has_embedding=chunk["has_embedding"],
            ).model_dump()
            
        except Exception as e:
            return {
                "error": str(e),
                "chunk_id": chunk_id,
            }
  • Pydantic BaseModel defining the output schema/structure for the get_chunk tool response.
    class ChunkDetail(BaseModel):
        """Chunk 详细信息"""
        chunk_id: int
        doc_id: str
        chunk_index: int
        section: str | None
        page_start: int
        page_end: int
        text: str
        token_count: int | None
        has_embedding: bool
  • Invocation of register_fetch_tools(mcp) in the main MCP server setup, which registers the get_chunk tool (and other fetch tools).
    register_fetch_tools(mcp)
  • The registration function that defines and registers the get_chunk tool using @mcp.tool() decorator.
    def register_fetch_tools(mcp: FastMCP) -> None:

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server