Skip to main content
Glama

get_doc

Retrieve detailed content from Yuque documents including body text, revision history, and permissions, with chunking support for large files.

Instructions

获取语雀中特定文档的详细内容,包括正文、修改历史和权限信息(支持分块处理大型文档)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
namespaceYes知识库的命名空间,格式为 user/repo
slugYes文档的唯一标识或短链接名称
chunk_indexNo要获取的文档块索引,不提供则返回第一块或全部(如果内容较小)
chunk_sizeNo分块大小(字符数),默认为100000
accessTokenNo用于认证 API 请求的令牌

Implementation Reference

  • src/server.ts:244-342 (registration)
    Registration of the 'get_doc' tool in the MCP server, specifying name, description, input schema, and handler function.
    this.server.tool(
      "get_doc",
      "获取语雀中特定文档的详细内容,包括正文、修改历史和权限信息(支持分块处理大型文档)",
      {
        namespace: z.string().describe("知识库的命名空间,格式为 user/repo"),
        slug: z.string().describe("文档的唯一标识或短链接名称"),
        chunk_index: z
          .number()
          .optional()
          .describe(
            "要获取的文档块索引,不提供则返回第一块或全部(如果内容较小)"
          ),
        chunk_size: z
          .number()
          .optional()
          .describe("分块大小(字符数),默认为100000"),
        accessToken: z.string().optional().describe("用于认证 API 请求的令牌"),
      },
      async ({
        namespace,
        slug,
        chunk_index,
        chunk_size = 100000,
        accessToken,
      }) => {
        try {
          Logger.log(`Fetching document ${slug} from repository: ${namespace}`);
          Logger.log(`accessToken: ${accessToken}`);
          const yuqueService = this.createYuqueService(accessToken);
          const doc = await yuqueService.getDoc(namespace, slug);
    
          Logger.log(
            `Successfully fetched document: ${doc.title}, content length: ${
              doc.body?.length || 0
            } chars`
          );
    
          // 将文档内容分割成块
          const docChunks = this.splitDocumentContent(doc, chunk_size);
    
          if (docChunks.length > 1) {
            Logger.log(
              `Document has been split into ${docChunks.length} chunks`
            );
    
            // 如果没有指定块索引,默认返回第一块
            if (chunk_index === undefined) {
              // 返回第一块的同时提供分块信息
              const firstChunk = docChunks[0];
              Logger.log(`Returning first chunk (1/${docChunks.length})`);
              return {
                content: [
                  { type: "text", text: JSON.stringify(firstChunk, null, 2) },
                ],
              };
            }
    
            // 如果指定了块索引,检查有效性
            if (chunk_index < 0 || chunk_index >= docChunks.length) {
              const error = `Invalid chunk_index: ${chunk_index}. Valid range is 0-${
                docChunks.length - 1
              }`;
              Logger.error(error);
              return {
                content: [{ type: "text", text: error }],
              };
            }
    
            // 返回指定的块
            Logger.log(
              `Returning chunk ${chunk_index + 1}/${docChunks.length}`
            );
            return {
              content: [
                {
                  type: "text",
                  text: JSON.stringify(docChunks[chunk_index], null, 2),
                },
              ],
            };
          } else {
            // 如果文档很小,不需要分块,直接返回完整文档
            Logger.log(`Document is small enough, no chunking needed`);
            return {
              content: [{ type: "text", text: JSON.stringify(doc, null, 2) }],
            };
          }
        } catch (error) {
          Logger.error(
            `Error fetching doc ${slug} from repo ${namespace}:`,
            error
          );
          return {
            content: [{ type: "text", text: `Error fetching doc: ${error}` }],
          };
        }
      }
    );
  • The main execution handler for the get_doc tool. Fetches the document using YuqueService, splits large documents into chunks, handles chunk_index parameter, and returns JSON serialized content.
    async ({
      namespace,
      slug,
      chunk_index,
      chunk_size = 100000,
      accessToken,
    }) => {
      try {
        Logger.log(`Fetching document ${slug} from repository: ${namespace}`);
        Logger.log(`accessToken: ${accessToken}`);
        const yuqueService = this.createYuqueService(accessToken);
        const doc = await yuqueService.getDoc(namespace, slug);
    
        Logger.log(
          `Successfully fetched document: ${doc.title}, content length: ${
            doc.body?.length || 0
          } chars`
        );
    
        // 将文档内容分割成块
        const docChunks = this.splitDocumentContent(doc, chunk_size);
    
        if (docChunks.length > 1) {
          Logger.log(
            `Document has been split into ${docChunks.length} chunks`
          );
    
          // 如果没有指定块索引,默认返回第一块
          if (chunk_index === undefined) {
            // 返回第一块的同时提供分块信息
            const firstChunk = docChunks[0];
            Logger.log(`Returning first chunk (1/${docChunks.length})`);
            return {
              content: [
                { type: "text", text: JSON.stringify(firstChunk, null, 2) },
              ],
            };
          }
    
          // 如果指定了块索引,检查有效性
          if (chunk_index < 0 || chunk_index >= docChunks.length) {
            const error = `Invalid chunk_index: ${chunk_index}. Valid range is 0-${
              docChunks.length - 1
            }`;
            Logger.error(error);
            return {
              content: [{ type: "text", text: error }],
            };
          }
    
          // 返回指定的块
          Logger.log(
            `Returning chunk ${chunk_index + 1}/${docChunks.length}`
          );
          return {
            content: [
              {
                type: "text",
                text: JSON.stringify(docChunks[chunk_index], null, 2),
              },
            ],
          };
        } else {
          // 如果文档很小,不需要分块,直接返回完整文档
          Logger.log(`Document is small enough, no chunking needed`);
          return {
            content: [{ type: "text", text: JSON.stringify(doc, null, 2) }],
          };
        }
      } catch (error) {
        Logger.error(
          `Error fetching doc ${slug} from repo ${namespace}:`,
          error
        );
        return {
          content: [{ type: "text", text: `Error fetching doc: ${error}` }],
        };
      }
    }
  • Zod-based input schema validation for get_doc tool parameters.
    {
      namespace: z.string().describe("知识库的命名空间,格式为 user/repo"),
      slug: z.string().describe("文档的唯一标识或短链接名称"),
      chunk_index: z
        .number()
        .optional()
        .describe(
          "要获取的文档块索引,不提供则返回第一块或全部(如果内容较小)"
        ),
      chunk_size: z
        .number()
        .optional()
        .describe("分块大小(字符数),默认为100000"),
      accessToken: z.string().optional().describe("用于认证 API 请求的令牌"),
    },
  • YuqueService.getDoc helper method that performs the actual API call to retrieve document details from Yuque and cleans up the response.
    async getDoc(namespace: string, slug: string, page?: number, page_size?: number): Promise<YuqueDoc> {
      const params: any = {};
      if (page !== undefined) params.page = page;
      if (page_size !== undefined) params.page_size = page_size;
      
      const response = await this.client.get(`/repos/${namespace}/docs/${slug}`, { params });
      // filter body_lake body_draft 
      // 过滤不需要的原始格式内容
      if (response.data.data.body_lake) delete response.data.data.body_lake;
      if (response.data.data.body_draft) delete response.data.data.body_draft;
      if (response.data.data.body_html) delete response.data.data.body_html;
      return response.data.data;
    }
  • Private helper method to split large document content into smaller JSON chunks with overlapping sections for better context in large docs.
    private splitDocumentContent(doc: any, chunkSize: number = 100000): any[] {
      // 先将整个文档对象转换为格式化的JSON字符串
      const fullDocString = JSON.stringify(doc, null, 2);
      console.log("fullDocString length: " + fullDocString.length);
    
      // 如果整个文档字符串长度小于块大小,直接返回原文档
      if (fullDocString.length <= chunkSize) {
        return [doc];
      }
    
      // 使用简单的文本分割逻辑,添加重叠内容
      const overlapSize = 200; // 块之间的重叠大小
      const chunks: string[] = [];
    
      // 直接按照固定大小分割,不考虑内容边界
      let startIndex = 0;
      while (startIndex < fullDocString.length) {
        // 计算当前块的结束位置
        const endIndex = Math.min(startIndex + chunkSize, fullDocString.length);
    
        // 提取当前块内容
        chunks.push(fullDocString.substring(startIndex, endIndex));
    
        // 更新下一个块的起始位置,确保有重叠
        startIndex = endIndex - overlapSize;
    
        // 如果已经到达文本末尾或下一次循环会导致无效分块,则退出循环
        if (startIndex >= fullDocString.length - overlapSize) {
          break;
        }
      }
    
      // 为每个块创建对应的文档对象,添加分块和上下文信息
      return chunks.map((chunk, index) => {
        // 创建一个返回对象
        const result: any = {
          _original_doc_id: doc.id,
          _original_title: doc.title,
          _chunk_info: {
            index: index,
            total: chunks.length,
            is_chunked: true,
            chunk_size: chunkSize,
            overlap_size: overlapSize,
            content_type: "full_doc_json",
            // 添加上下文信息
            context: {
              has_previous: index > 0,
              has_next: index < chunks.length - 1,
              // 添加提示,指出这是部分内容
              note: index > 0 ? "此内容包含与前一块重叠的部分" : "",
            },
          },
        };
    
        // 保存原始文本块
        result.text_content = chunk;
    
        // 尝试将文本块解析回JSON(如果是完整的JSON对象)
        try {
          // 仅当文本以 { 开头且以 } 结尾时尝试解析
          if (chunk.trim().startsWith("{") && chunk.trim().endsWith("}")) {
            const parsedChunk = JSON.parse(chunk);
    
            // 合并解析后的属性到结果对象
            Object.assign(result, parsedChunk);
          }
        } catch (e) {
          // 解析失败,保留文本格式
          result.parse_error = "块内容不是完整的JSON对象,保留为文本";
        }
    
        // 修改标题,添加分块标记
        result.title = `${doc.title} [部分 ${index + 1}/${chunks.length}]`;
    
        return result;
      });
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions that the tool retrieves detailed content and supports chunking for large documents, which adds some context. However, it lacks critical details like authentication requirements (implied by 'accessToken' parameter but not stated), rate limits, error conditions, or what happens if chunk parameters are omitted. For a tool with 5 parameters and no annotations, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence in Chinese that front-loads the core purpose and includes key features (chunking support). It avoids redundancy and wastes no words, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description is incomplete. It covers the basic purpose and chunking but misses authentication needs, error handling, output format details, and differentiation from siblings. For a tool that retrieves detailed document data, more context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal value beyond the schema: it implies the tool handles large documents via chunking, which relates to 'chunk_index' and 'chunk_size', but doesn't provide additional syntax or format details. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '获取语雀中特定文档的详细内容,包括正文、修改历史和权限信息' (retrieve detailed content of a specific document in Yuque, including body, revision history, and permission information). It specifies the resource (document in Yuque) and what content is retrieved, though it doesn't explicitly distinguish from siblings like 'get_doc_chunks_info' or 'get_repo_docs' beyond mentioning chunking support.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions chunking for large documents but doesn't clarify when to use 'get_doc' vs. 'get_doc_chunks_info' for chunk-related operations or 'get_repo_docs' for listing documents. No explicit when/when-not or alternative tool references are included.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HenryHaoson/Yuque-MCP-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server