Skip to main content
Glama

get_chunk

Retrieve complete text content and metadata for a specific text chunk by providing its unique identifier. Get full text, page ranges, and document information to access segmented academic content.

Instructions

获取指定 chunk 的完整内容

根据 chunk_id 获取文本块的完整信息,包括全文、页码、所属文档等。

Args: chunk_id: chunk 的唯一标识符

Returns: chunk 的详细信息,包含: - chunk_id: chunk ID - doc_id: 所属文档 ID - text: 完整文本 - page_start/page_end: 页码范围 - has_embedding: 是否有 embedding

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
chunk_idYes

Implementation Reference

  • The handler function for the 'get_chunk' tool, decorated with @mcp.tool() for automatic registration. It retrieves chunk details from the database using a SQL query and returns a structured dictionary via the ChunkDetail Pydantic model.
    @mcp.tool() def get_chunk(chunk_id: int) -> dict[str, Any]: """获取指定 chunk 的完整内容 根据 chunk_id 获取文本块的完整信息,包括全文、页码、所属文档等。 Args: chunk_id: chunk 的唯一标识符 Returns: chunk 的详细信息,包含: - chunk_id: chunk ID - doc_id: 所属文档 ID - text: 完整文本 - page_start/page_end: 页码范围 - has_embedding: 是否有 embedding """ try: # 查询 chunk 信息 chunk = query_one( """ SELECT c.chunk_id, c.doc_id, c.chunk_index, c.section, c.page_start, c.page_end, c.text, c.token_count, CASE WHEN ce.chunk_id IS NOT NULL THEN true ELSE false END as has_embedding FROM chunks c LEFT JOIN chunk_embeddings ce ON c.chunk_id = ce.chunk_id WHERE c.chunk_id = %s """, (chunk_id,) ) if not chunk: return { "error": f"Chunk not found: {chunk_id}", "chunk_id": chunk_id, } return ChunkDetail( chunk_id=chunk["chunk_id"], doc_id=chunk["doc_id"], chunk_index=chunk["chunk_index"], section=chunk["section"], page_start=chunk["page_start"], page_end=chunk["page_end"], text=chunk["text"], token_count=chunk["token_count"], has_embedding=chunk["has_embedding"], ).model_dump() except Exception as e: return { "error": str(e), "chunk_id": chunk_id, }
  • Pydantic BaseModel defining the output schema/structure for the get_chunk tool response.
    class ChunkDetail(BaseModel): """Chunk 详细信息""" chunk_id: int doc_id: str chunk_index: int section: str | None page_start: int page_end: int text: str token_count: int | None has_embedding: bool
  • Invocation of register_fetch_tools(mcp) in the main MCP server setup, which registers the get_chunk tool (and other fetch tools).
    register_fetch_tools(mcp)
  • The registration function that defines and registers the get_chunk tool using @mcp.tool() decorator.
    def register_fetch_tools(mcp: FastMCP) -> None:

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paperlib-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server