Video Clip MCP

mergeVideos

Combine multiple video files into one, with support for different formats and resolutions. This tool merges videos while intelligently adapting various inputs to create a unified output file.

Instructions

合并多个视频文件，支持不同格式和分辨率的智能适配

Input Schema

TableJSON Schema

Name	Required	Description
`inputPaths`	Yes	输入视频文件路径数组
`outputPath`	Yes	输出视频文件路径
`quality`	No	视频质量预设
`videoCodec`	No	视频编码格式
`audioCodec`	No	音频编码格式
`resolution`	No	目标分辨率
`fps`	No	目标帧率

Implementation Reference

src/core/video-engine.ts:134-212 (handler)

Core handler implementing video merging: validates inputs, creates FFmpeg concat file list, sets up command with encoding options or copy, handles progress/cleanup and returns ProcessResult.

public async mergeVideos(options: MergeOptions): Promise<ProcessResult> {
  const startTime = Date.now();
  const taskId = uuidv4();

  try {
    // 验证所有输入文件
    for (const inputPath of options.inputPaths) {
      await this.validateInputFile(inputPath);
    }

    // 确保输出目录存在
    await this.ensureOutputDir(options.outputPath);

    return new Promise(async (resolve, reject) => {
      try {
        // 创建临时文件列表
        const tempListPath = path.join(path.dirname(options.outputPath), `temp_list_${taskId}.txt`);
        const fileList = options.inputPaths.map(p => `file '${path.resolve(p).replace(/\\/g, '/')}'`).join('\n');
        await fs.writeFile(tempListPath, fileList, 'utf8');

        const command = ffmpeg()
          .input(tempListPath)
          .inputOptions(['-f', 'concat', '-safe', '0'])
          .output(options.outputPath);

        // 设置编码参数 - 避免使用复杂滤镜
        if (options.videoCodec || options.audioCodec) {
          // 需要重新编码
          command.outputOptions(['-c:v', options.videoCodec || 'libx264']);
          command.outputOptions(['-c:a', options.audioCodec || 'aac']);
          this.applyEncodingOptions(command, options);
        } else {
          // 使用流复制，更快更稳定
          command.outputOptions(['-c', 'copy']);
        }

        command.on('end', async () => {
          // 清理临时文件
          try {
            await fs.unlink(tempListPath);
          } catch (e) {
            console.warn('清理临时文件失败:', e);
          }
          
          resolve({
            success: true,
            outputPaths: [options.outputPath],
            duration: Date.now() - startTime
          });
        });

        command.on('error', async (err: any) => {
          // 清理临时文件
          try {
            await fs.unlink(tempListPath);
          } catch (e) {
            console.warn('清理临时文件失败:', e);
          }
          
          reject(new Error(`视频合并失败: ${err.message}`));
        });

        this.processingTasks.set(taskId, command);
        command.run();
        
      } catch (error) {
        reject(error);
      }
    });

  } catch (error) {
    return {
      success: false,
      outputPaths: [],
      duration: Date.now() - startTime,
      error: error instanceof Error ? error.message : '未知错误'
    };
  }
}

src/mcp/server.ts:356-365 (handler)

MCP server wrapper handler for mergeVideos tool: delegates to VideoEngine.mergeVideos and formats response as MCP content.

private async handleMergeVideos(args: MCPToolParams['mergeVideos']) {
  const result = await this.videoEngine.mergeVideos(args);
  return {
    content: [
      {
        type: 'text',
        text: JSON.stringify(result, null, 2),
      },
    ],
  };

src/mcp/server.ts:160-204 (registration)

Registers the mergeVideos tool in MCP server's tool list with name, description, and detailed inputSchema.

{
  name: 'mergeVideos',
  description: '合并多个视频文件，支持不同格式和分辨率的智能适配',
  inputSchema: {
    type: 'object',
    properties: {
      inputPaths: {
        type: 'array',
        items: { type: 'string' },
        description: '输入视频文件路径数组'
      },
      outputPath: {
        type: 'string',
        description: '输出视频文件路径'
      },
      quality: {
        type: 'string',
        enum: Object.values(QualityPreset),
        description: '视频质量预设'
      },
      videoCodec: {
        type: 'string',
        enum: Object.values(VideoCodec),
        description: '视频编码格式'
      },
      audioCodec: {
        type: 'string',
        enum: Object.values(AudioCodec),
        description: '音频编码格式'
      },
      resolution: {
        type: 'object',
        properties: {
          width: { type: 'number' },
          height: { type: 'number' }
        },
        description: '目标分辨率'
      },
      fps: {
        type: 'number',
        description: '目标帧率'
      }
    },
    required: ['inputPaths', 'outputPath']
  }

src/types/mcp.ts:15-52 (schema)

TypeScript interfaces defining input (MergeOptions) and output (ProcessResult) types for the mergeVideos tool.

export interface MCPToolParams {
  // 视频剪辑工具参数
  clipVideo: ClipOptions;
  
  // 视频合并工具参数
  mergeVideos: MergeOptions;
  
  // 视频分割工具参数
  splitVideo: SplitOptions;
  
  // 获取视频信息工具参数
  getVideoInfo: {
    filePath: string;
  };
  
  // 批量处理工具参数
  batchProcess: {
    tasks: Omit<BatchTask, 'id' | 'status' | 'createdAt'>[];
  };
  
  // 获取支持格式工具参数
  getSupportedFormats: Record<string, never>;
  
  // 取消任务工具参数
  cancelTask: {
    taskId: string;
  };
  
  // 获取任务状态工具参数
  getTaskStatus: {
    taskId: string;
  };
}

// MCP工具返回值类型
export interface MCPToolResults {
  clipVideo: ProcessResult;
  mergeVideos: ProcessResult;

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions '智能适配' (intelligent adaptation) for formats and resolutions, hinting at automatic processing, but doesn't clarify critical behaviors: whether merging is destructive to source files, if it requires specific permissions, processing time expectations, error handling, or output format details. For a complex video processing tool with 7 parameters, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('合并多个视频文件') and adds a key feature ('支持不同格式和分辨率的智能适配'). There is zero wasted text, and it's appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, video processing with potential side effects), lack of annotations, and no output schema, the description is incomplete. It doesn't address what the tool returns (e.g., success status, error messages), behavioral nuances like file overwriting or resource usage, or how it differs from siblings. For a mutation tool with rich parameters, this minimal description leaves too much undefined.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all parameters well-documented in the schema itself (e.g., '输入视频文件路径数组' for inputPaths, enum descriptions for codecs). The description adds no additional parameter semantics beyond implying format/resolution adaptation, which is already covered by schema fields like 'resolution', 'videoCodec', and 'audioCodec'. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '合并多个视频文件' (merge multiple video files) with the added feature of '智能适配' (intelligent adaptation) for different formats and resolutions. It specifies the verb (merge) and resource (video files), but doesn't explicitly distinguish it from sibling tools like 'clipVideo' or 'splitVideo' beyond mentioning format/resolution adaptation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'batchProcess' (which might handle multiple files differently) or 'clipVideo'/'splitVideo' (which modify rather than merge videos). There's no context about prerequisites, limitations, or typical use cases beyond the basic functionality stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pickstar-2002/video-clip-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server