Skip to main content
Glama

get_doc

Retrieve detailed content from Yuque documents including body text, revision history, and permissions, with chunking support for large files.

Instructions

获取语雀中特定文档的详细内容,包括正文、修改历史和权限信息(支持分块处理大型文档)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
namespaceYes知识库的命名空间,格式为 user/repo
slugYes文档的唯一标识或短链接名称
chunk_indexNo要获取的文档块索引,不提供则返回第一块或全部(如果内容较小)
chunk_sizeNo分块大小(字符数),默认为100000
accessTokenNo用于认证 API 请求的令牌

Implementation Reference

  • src/server.ts:244-342 (registration)
    Registration of the 'get_doc' tool in the MCP server, specifying name, description, input schema, and handler function.
    this.server.tool(
      "get_doc",
      "获取语雀中特定文档的详细内容,包括正文、修改历史和权限信息(支持分块处理大型文档)",
      {
        namespace: z.string().describe("知识库的命名空间,格式为 user/repo"),
        slug: z.string().describe("文档的唯一标识或短链接名称"),
        chunk_index: z
          .number()
          .optional()
          .describe(
            "要获取的文档块索引,不提供则返回第一块或全部(如果内容较小)"
          ),
        chunk_size: z
          .number()
          .optional()
          .describe("分块大小(字符数),默认为100000"),
        accessToken: z.string().optional().describe("用于认证 API 请求的令牌"),
      },
      async ({
        namespace,
        slug,
        chunk_index,
        chunk_size = 100000,
        accessToken,
      }) => {
        try {
          Logger.log(`Fetching document ${slug} from repository: ${namespace}`);
          Logger.log(`accessToken: ${accessToken}`);
          const yuqueService = this.createYuqueService(accessToken);
          const doc = await yuqueService.getDoc(namespace, slug);
    
          Logger.log(
            `Successfully fetched document: ${doc.title}, content length: ${
              doc.body?.length || 0
            } chars`
          );
    
          // 将文档内容分割成块
          const docChunks = this.splitDocumentContent(doc, chunk_size);
    
          if (docChunks.length > 1) {
            Logger.log(
              `Document has been split into ${docChunks.length} chunks`
            );
    
            // 如果没有指定块索引,默认返回第一块
            if (chunk_index === undefined) {
              // 返回第一块的同时提供分块信息
              const firstChunk = docChunks[0];
              Logger.log(`Returning first chunk (1/${docChunks.length})`);
              return {
                content: [
                  { type: "text", text: JSON.stringify(firstChunk, null, 2) },
                ],
              };
            }
    
            // 如果指定了块索引,检查有效性
            if (chunk_index < 0 || chunk_index >= docChunks.length) {
              const error = `Invalid chunk_index: ${chunk_index}. Valid range is 0-${
                docChunks.length - 1
              }`;
              Logger.error(error);
              return {
                content: [{ type: "text", text: error }],
              };
            }
    
            // 返回指定的块
            Logger.log(
              `Returning chunk ${chunk_index + 1}/${docChunks.length}`
            );
            return {
              content: [
                {
                  type: "text",
                  text: JSON.stringify(docChunks[chunk_index], null, 2),
                },
              ],
            };
          } else {
            // 如果文档很小,不需要分块,直接返回完整文档
            Logger.log(`Document is small enough, no chunking needed`);
            return {
              content: [{ type: "text", text: JSON.stringify(doc, null, 2) }],
            };
          }
        } catch (error) {
          Logger.error(
            `Error fetching doc ${slug} from repo ${namespace}:`,
            error
          );
          return {
            content: [{ type: "text", text: `Error fetching doc: ${error}` }],
          };
        }
      }
    );
  • The main execution handler for the get_doc tool. Fetches the document using YuqueService, splits large documents into chunks, handles chunk_index parameter, and returns JSON serialized content.
    async ({
      namespace,
      slug,
      chunk_index,
      chunk_size = 100000,
      accessToken,
    }) => {
      try {
        Logger.log(`Fetching document ${slug} from repository: ${namespace}`);
        Logger.log(`accessToken: ${accessToken}`);
        const yuqueService = this.createYuqueService(accessToken);
        const doc = await yuqueService.getDoc(namespace, slug);
    
        Logger.log(
          `Successfully fetched document: ${doc.title}, content length: ${
            doc.body?.length || 0
          } chars`
        );
    
        // 将文档内容分割成块
        const docChunks = this.splitDocumentContent(doc, chunk_size);
    
        if (docChunks.length > 1) {
          Logger.log(
            `Document has been split into ${docChunks.length} chunks`
          );
    
          // 如果没有指定块索引,默认返回第一块
          if (chunk_index === undefined) {
            // 返回第一块的同时提供分块信息
            const firstChunk = docChunks[0];
            Logger.log(`Returning first chunk (1/${docChunks.length})`);
            return {
              content: [
                { type: "text", text: JSON.stringify(firstChunk, null, 2) },
              ],
            };
          }
    
          // 如果指定了块索引,检查有效性
          if (chunk_index < 0 || chunk_index >= docChunks.length) {
            const error = `Invalid chunk_index: ${chunk_index}. Valid range is 0-${
              docChunks.length - 1
            }`;
            Logger.error(error);
            return {
              content: [{ type: "text", text: error }],
            };
          }
    
          // 返回指定的块
          Logger.log(
            `Returning chunk ${chunk_index + 1}/${docChunks.length}`
          );
          return {
            content: [
              {
                type: "text",
                text: JSON.stringify(docChunks[chunk_index], null, 2),
              },
            ],
          };
        } else {
          // 如果文档很小,不需要分块,直接返回完整文档
          Logger.log(`Document is small enough, no chunking needed`);
          return {
            content: [{ type: "text", text: JSON.stringify(doc, null, 2) }],
          };
        }
      } catch (error) {
        Logger.error(
          `Error fetching doc ${slug} from repo ${namespace}:`,
          error
        );
        return {
          content: [{ type: "text", text: `Error fetching doc: ${error}` }],
        };
      }
    }
  • Zod-based input schema validation for get_doc tool parameters.
    {
      namespace: z.string().describe("知识库的命名空间,格式为 user/repo"),
      slug: z.string().describe("文档的唯一标识或短链接名称"),
      chunk_index: z
        .number()
        .optional()
        .describe(
          "要获取的文档块索引,不提供则返回第一块或全部(如果内容较小)"
        ),
      chunk_size: z
        .number()
        .optional()
        .describe("分块大小(字符数),默认为100000"),
      accessToken: z.string().optional().describe("用于认证 API 请求的令牌"),
    },
  • YuqueService.getDoc helper method that performs the actual API call to retrieve document details from Yuque and cleans up the response.
    async getDoc(namespace: string, slug: string, page?: number, page_size?: number): Promise<YuqueDoc> {
      const params: any = {};
      if (page !== undefined) params.page = page;
      if (page_size !== undefined) params.page_size = page_size;
      
      const response = await this.client.get(`/repos/${namespace}/docs/${slug}`, { params });
      // filter body_lake body_draft 
      // 过滤不需要的原始格式内容
      if (response.data.data.body_lake) delete response.data.data.body_lake;
      if (response.data.data.body_draft) delete response.data.data.body_draft;
      if (response.data.data.body_html) delete response.data.data.body_html;
      return response.data.data;
    }
  • Private helper method to split large document content into smaller JSON chunks with overlapping sections for better context in large docs.
    private splitDocumentContent(doc: any, chunkSize: number = 100000): any[] {
      // 先将整个文档对象转换为格式化的JSON字符串
      const fullDocString = JSON.stringify(doc, null, 2);
      console.log("fullDocString length: " + fullDocString.length);
    
      // 如果整个文档字符串长度小于块大小,直接返回原文档
      if (fullDocString.length <= chunkSize) {
        return [doc];
      }
    
      // 使用简单的文本分割逻辑,添加重叠内容
      const overlapSize = 200; // 块之间的重叠大小
      const chunks: string[] = [];
    
      // 直接按照固定大小分割,不考虑内容边界
      let startIndex = 0;
      while (startIndex < fullDocString.length) {
        // 计算当前块的结束位置
        const endIndex = Math.min(startIndex + chunkSize, fullDocString.length);
    
        // 提取当前块内容
        chunks.push(fullDocString.substring(startIndex, endIndex));
    
        // 更新下一个块的起始位置,确保有重叠
        startIndex = endIndex - overlapSize;
    
        // 如果已经到达文本末尾或下一次循环会导致无效分块,则退出循环
        if (startIndex >= fullDocString.length - overlapSize) {
          break;
        }
      }
    
      // 为每个块创建对应的文档对象,添加分块和上下文信息
      return chunks.map((chunk, index) => {
        // 创建一个返回对象
        const result: any = {
          _original_doc_id: doc.id,
          _original_title: doc.title,
          _chunk_info: {
            index: index,
            total: chunks.length,
            is_chunked: true,
            chunk_size: chunkSize,
            overlap_size: overlapSize,
            content_type: "full_doc_json",
            // 添加上下文信息
            context: {
              has_previous: index > 0,
              has_next: index < chunks.length - 1,
              // 添加提示,指出这是部分内容
              note: index > 0 ? "此内容包含与前一块重叠的部分" : "",
            },
          },
        };
    
        // 保存原始文本块
        result.text_content = chunk;
    
        // 尝试将文本块解析回JSON(如果是完整的JSON对象)
        try {
          // 仅当文本以 { 开头且以 } 结尾时尝试解析
          if (chunk.trim().startsWith("{") && chunk.trim().endsWith("}")) {
            const parsedChunk = JSON.parse(chunk);
    
            // 合并解析后的属性到结果对象
            Object.assign(result, parsedChunk);
          }
        } catch (e) {
          // 解析失败,保留文本格式
          result.parse_error = "块内容不是完整的JSON对象,保留为文本";
        }
    
        // 修改标题,添加分块标记
        result.title = `${doc.title} [部分 ${index + 1}/${chunks.length}]`;
    
        return result;
      });
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/HenryHaoson/Yuque-MCP-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server