get_doc_chunks_info
Retrieve document chunk metadata including total chunks and character counts per chunk from Yuque knowledge bases to analyze content structure and optimize processing.
Instructions
获取文档的分块元信息,包括总块数、每块的字符数等
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| namespace | Yes | 知识库的命名空间,格式为 user/repo | |
| slug | Yes | 文档的唯一标识或短链接名称 | |
| chunk_size | No | 分块大小(字符数),默认为100000 | |
| accessToken | No | 用于认证 API 请求的令牌 |
Implementation Reference
- src/server.ts:742-831 (registration)Registration of the 'get_doc_chunks_info' MCP tool using McpServer.tool(), including inline schema and handler function.this.server.tool( "get_doc_chunks_info", "获取文档的分块元信息,包括总块数、每块的字符数等", { namespace: z.string().describe("知识库的命名空间,格式为 user/repo"), slug: z.string().describe("文档的唯一标识或短链接名称"), chunk_size: z .number() .optional() .describe("分块大小(字符数),默认为100000"), accessToken: z.string().optional().describe("用于认证 API 请求的令牌"), }, async ({ namespace, slug, chunk_size = 100000, accessToken }) => { try { Logger.log( `Fetching document chunk info for ${slug} from repository: ${namespace}` ); const yuqueService = this.createYuqueService(accessToken); const doc = await yuqueService.getDoc(namespace, slug); // 将整个文档转换为JSON字符串来评估总长度 const fullDocString = JSON.stringify(doc, null, 2); // 计算会产生多少块 const overlapSize = 200; let totalChunks = 1; if (fullDocString.length > chunk_size) { // 简单计算分块数量,考虑重叠 // 公式:向上取整((总长度 - 重叠大小) / (块大小 - 重叠大小)) totalChunks = Math.ceil( (fullDocString.length - overlapSize) / (chunk_size - overlapSize) ); } // 构建分块元信息对象 const chunksInfo = { document_id: doc.id, title: doc.title, total_chunks: totalChunks, total_length: fullDocString.length, chunk_size: chunk_size, overlap_size: overlapSize, estimated_chunks: Array.from( { length: totalChunks }, (_, index) => { // 估计每个块的起始和结束位置 const startPosition = index === 0 ? 0 : index * (chunk_size - overlapSize); const endPosition = Math.min( startPosition + chunk_size, fullDocString.length ); return { index: index, title: `${doc.title} [部分 ${index + 1}/${totalChunks}]`, approximate_start: startPosition, approximate_end: endPosition, approximate_length: endPosition - startPosition, how_to_get: `使用 get_doc 工具,指定 chunk_index=${index}`, }; } ), }; Logger.log( `Document would be split into ${totalChunks} chunks with size ${chunk_size}` ); return { content: [ { type: "text", text: JSON.stringify(chunksInfo, null, 2) }, ], }; } catch (error) { Logger.error( `Error fetching doc chunks info for ${slug} from repo ${namespace}:`, error ); return { content: [ { type: "text", text: `Error fetching doc chunks info: ${error}`, }, ], }; } } );
- src/server.ts:754-830 (handler)The core handler function that fetches the document content via YuqueService, estimates the number of chunks based on JSON string length, and returns detailed chunk metadata.async ({ namespace, slug, chunk_size = 100000, accessToken }) => { try { Logger.log( `Fetching document chunk info for ${slug} from repository: ${namespace}` ); const yuqueService = this.createYuqueService(accessToken); const doc = await yuqueService.getDoc(namespace, slug); // 将整个文档转换为JSON字符串来评估总长度 const fullDocString = JSON.stringify(doc, null, 2); // 计算会产生多少块 const overlapSize = 200; let totalChunks = 1; if (fullDocString.length > chunk_size) { // 简单计算分块数量,考虑重叠 // 公式:向上取整((总长度 - 重叠大小) / (块大小 - 重叠大小)) totalChunks = Math.ceil( (fullDocString.length - overlapSize) / (chunk_size - overlapSize) ); } // 构建分块元信息对象 const chunksInfo = { document_id: doc.id, title: doc.title, total_chunks: totalChunks, total_length: fullDocString.length, chunk_size: chunk_size, overlap_size: overlapSize, estimated_chunks: Array.from( { length: totalChunks }, (_, index) => { // 估计每个块的起始和结束位置 const startPosition = index === 0 ? 0 : index * (chunk_size - overlapSize); const endPosition = Math.min( startPosition + chunk_size, fullDocString.length ); return { index: index, title: `${doc.title} [部分 ${index + 1}/${totalChunks}]`, approximate_start: startPosition, approximate_end: endPosition, approximate_length: endPosition - startPosition, how_to_get: `使用 get_doc 工具,指定 chunk_index=${index}`, }; } ), }; Logger.log( `Document would be split into ${totalChunks} chunks with size ${chunk_size}` ); return { content: [ { type: "text", text: JSON.stringify(chunksInfo, null, 2) }, ], }; } catch (error) { Logger.error( `Error fetching doc chunks info for ${slug} from repo ${namespace}:`, error ); return { content: [ { type: "text", text: `Error fetching doc chunks info: ${error}`, }, ], }; } }
- src/server.ts:745-753 (schema)Input schema defined with Zod validators for the tool parameters: namespace, slug, optional chunk_size and accessToken.{ namespace: z.string().describe("知识库的命名空间,格式为 user/repo"), slug: z.string().describe("文档的唯一标识或短链接名称"), chunk_size: z .number() .optional() .describe("分块大小(字符数),默认为100000"), accessToken: z.string().optional().describe("用于认证 API 请求的令牌"), },