Skip to main content
Glama
ccw33
by ccw33

process_file

Extract and process content from PDFs, documents, spreadsheets, presentations, and images using AI-powered analysis with optional guidance prompts.

Instructions

使用 GLM-4.5V 处理文件(上传并提取内容)。支持 PDF、DOCX、DOC、XLS、XLSX、PPT、PPTX、PNG、JPG、JPEG、CSV 等格式

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYes文件路径(本地文件路径)
extractPromptNo可选的内容提取提示词,用于指导如何提取文件内容

Implementation Reference

  • src/index.ts:113-140 (registration)
    Registers the 'process_file' tool with the MCP server, including description, input schema using Zod, and an async wrapper handler that calls the processFile function and handles errors.
    mcpServer.registerTool("process_file", {
      description: "使用 GLM-4.5V 处理文件(上传并提取内容)。支持 PDF、DOCX、DOC、XLS、XLSX、PPT、PPTX、PNG、JPG、JPEG、CSV 等格式",
      inputSchema: {
        filePath: z.string().describe("文件路径(本地文件路径)"),
        extractPrompt: z.string().optional().describe("可选的内容提取提示词,用于指导如何提取文件内容"),
      },
    }, async ({ filePath, extractPrompt }) => {
      try {
        const result = await processFile(filePath, extractPrompt);
        return {
          content: [{
            type: "text" as const,
            text: JSON.stringify(result, null, 2)
          }]
        };
      } catch (error) {
        return {
          content: [{
            type: "text" as const,
            text: JSON.stringify({
              ok: false,
              error: error instanceof Error ? error.message : "Unknown error"
            }, null, 2)
          }],
          isError: true
        };
      }
    });
  • Core handler function for processing files: validates file path, size, and type; uploads to GLM API; retrieves extracted content; returns structured result.
    async function processFile(filePath: string, extractPrompt?: string): Promise<FileProcessingResult> {
      const startTime = Date.now();
    
      try {
        console.error(`[DEBUG] processFile called with path: ${filePath}`);
    
        // 检查文件是否存在
        const resolvedPath = path.resolve(filePath);
        const stats = await fs.stat(resolvedPath);
        const fileSize = stats.size;
        const filename = path.basename(filePath);
    
        console.error(`[DEBUG] File size: ${fileSize} bytes, filename: ${filename}`);
    
        // 检查文件大小限制
        const maxSize = isImageFile(filename) ? 5 * 1024 * 1024 : 50 * 1024 * 1024; // 图片5MB,其他50MB
        if (fileSize > maxSize) {
          throw new Error(`文件大小超过限制。图片文件最大5MB,其他文件最大50MB。当前文件大小:${(fileSize / 1024 / 1024).toFixed(2)}MB`);
        }
    
        // 检查文件格式
        if (!isSupportedFileType(filename)) {
          throw new Error(`不支持的文件格式。支持的格式:PDF、DOCX、DOC、XLS、XLSX、PPT、PPTX、PNG、JPG、JPEG、CSV`);
        }
    
        // 1. 上传文件
        console.error(`[DEBUG] Uploading file...`);
        const fileId = await uploadFileToGLM(resolvedPath, filename);
        console.error(`[DEBUG] File uploaded with ID: ${fileId}`);
    
        // 2. 获取文件内容
        console.error(`[DEBUG] Getting file content...`);
        const content = await getFileContentFromGLM(fileId);
        console.error(`[DEBUG] Content extracted, length: ${content.length}`);
    
        const processingTime = Date.now() - startTime;
    
        return {
          ok: true,
          fileId,
          content,
          fileType: getFileType(filename),
          filename,
          metadata: {
            uploadTime: startTime,
            fileSize,
            processingTime
          }
        };
      } catch (error) {
        return {
          ok: false,
          error: error instanceof Error ? error.message : "Unknown error"
        };
      }
    }
  • TypeScript interface defining the output structure of the file processing result.
    interface FileProcessingResult {
      ok: boolean;
      fileId?: string;
      content?: string;
      fileType?: string;
      filename?: string;
      error?: string;
      metadata?: {
        uploadTime: number;
        fileSize: number;
        processingTime: number;
      };
    }
  • Helper function that uploads the file to the GLM API using FormData and returns the assigned file ID.
    async function uploadFileToGLM(filePath: string, filename: string): Promise<string> {
      const glmApiKey = process.env.GLM_API_KEY;
      if (!glmApiKey) {
        throw new Error("GLM_API_KEY environment variable is required");
      }
    
      try {
        // 读取文件
        const fileBuffer = await fs.readFile(filePath);
    
        // 创建 FormData
        const formData = new FormData();
        const blob = new Blob([new Uint8Array(fileBuffer)]);
        formData.append('file', blob, filename);
        formData.append('purpose', 'file-extract');
    
        const response = await fetch('https://open.bigmodel.cn/api/paas/v4/files', {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${glmApiKey}`
          },
          body: formData
        });
    
        if (!response.ok) {
          const errorText = await response.text();
          throw new Error(`文件上传失败: ${response.status} ${response.statusText} - ${errorText}`);
        }
    
        const result = await response.json();
        if (!result.id) {
          throw new Error('上传响应中缺少文件ID');
        }
    
        return result.id;
      } catch (error) {
        throw new Error(`文件上传失败: ${error instanceof Error ? error.message : 'Unknown error'}`);
      }
    }
  • Helper function that retrieves the extracted content from the GLM API using the file ID, handling JSON or text responses.
    async function getFileContentFromGLM(fileId: string): Promise<string> {
      const glmApiKey = process.env.GLM_API_KEY;
      if (!glmApiKey) {
        throw new Error("GLM_API_KEY environment variable is required");
      }
    
      try {
        const response = await fetch(`https://open.bigmodel.cn/api/paas/v4/files/${fileId}/content`, {
          method: 'GET',
          headers: {
            'Authorization': `Bearer ${glmApiKey}`
          }
        });
    
        if (!response.ok) {
          const errorText = await response.text();
          throw new Error(`获取文件内容失败: ${response.status} ${response.statusText} - ${errorText}`);
        }
    
        // 根据响应类型处理内容
        const contentType = response.headers.get('content-type') || '';
    
        if (contentType.includes('application/json')) {
          const jsonResult = await response.json();
          return JSON.stringify(jsonResult, null, 2);
        } else {
          // 对于其他类型,尝试作为文本读取
          return await response.text();
        }
      } catch (error) {
        throw new Error(`获取文件内容失败: ${error instanceof Error ? error.message : 'Unknown error'}`);
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions uploading and extracting content but lacks details on permissions needed, rate limits, error handling, or what 'processing' entails beyond extraction. For a tool that interacts with files and an AI model (GLM-4.5V), this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, stating the core purpose in the first part. It efficiently lists supported formats without unnecessary elaboration. However, it could be slightly improved by structuring usage guidelines or behavioral details, but it avoids redundancy and waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of file processing with an AI model (GLM-4.5V), no annotations, and no output schema, the description is incomplete. It doesn't cover what the tool returns (e.g., extracted text, structured data), error cases, or operational constraints. For a tool with potential side effects (uploading files), more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('filePath' and 'extractPrompt') with descriptions. The description adds minimal value beyond the schema by implying file processing and extraction, but it doesn't provide additional syntax, format details, or examples for parameters. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '使用 GLM-4.5V 处理文件(上传并提取内容)' - it processes files by uploading and extracting content using GLM-4.5V. It specifies supported formats (PDF, DOCX, etc.), making the purpose concrete. However, it doesn't explicitly differentiate from sibling tools like 'read_image' or 'vision_query', which might have overlapping functionality for image processing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It lists supported formats but doesn't explain scenarios where this tool is preferred over 'read_image' (which might handle images) or 'vision_query' (which might involve visual queries). There's no mention of prerequisites, limitations, or comparative use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ccw33/Multimodel-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server