Skip to main content
Glama
stabgan

OpenRouter MCP Multimodal Server

generate_video

Generate a video from a text description using AI. Supports async processing with optional image conditioning for first/last frames or style references.

Instructions

Generate a video from a text prompt using an OpenRouter video-generation model (default: google/veo-3.1). Submits an async job, polls until completion or max_wait_ms, then downloads the result. Optionally conditioned on first/last-frame images or reference images. Large outputs are auto-saved when save_path is provided and path-sandboxed.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the desired video.
modelNoOverride the video model ID.
resolutionNo480p / 720p / 1080p / 1K / 2K / 4K (model-dependent).
aspect_ratioNo16:9 / 9:16 / 1:1 / 4:3 / 3:4 / 21:9 / 9:21 (model-dependent).
durationNoDuration in seconds (model-dependent).
seedNoDeterministic seed when supported.
first_frame_imageNoOptional image (path, URL, or data URL) used as the first frame for image-to-video.
last_frame_imageNoOptional image used as the last frame for frame transitions.
reference_imagesNoOptional style/content reference images.
providerNoProvider-specific passthrough options keyed by provider slug.
save_pathNoWhere to save the video. Routed through the OPENROUTER_OUTPUT_DIR sandbox; extension auto-corrected.
max_wait_msNoTotal time to wait for the async job before returning a resumable handle (default 600000 ms).
poll_interval_msNoPolling cadence (default 15000 ms).

Implementation Reference

  • Main handler function for the generate_video tool. Validates input, submits a video generation job via OpenRouter API, polls until completion or timeout, then downloads and returns/saves the result.
    export async function handleGenerateVideo(
      request: { params: { arguments: GenerateVideoToolRequest } },
      apiClient: OpenRouterAPIClient,
      progress?: ProgressHook,
    ) {
      const args = request.params.arguments ?? ({} as GenerateVideoToolRequest);
      if (!args.prompt || !args.prompt.trim()) {
        return toolError(ErrorCode.INVALID_INPUT, 'prompt is required.');
      }
    
      // Fail-fast on unsafe save_path BEFORE spending credits on the job.
      let safeSavePath: string | null = null;
      if (args.save_path) {
        try {
          safeSavePath = await resolveSafeOutputPath(args.save_path);
        } catch (err) {
          if (err instanceof UnsafeOutputPathError) return toolErrorFrom(ErrorCode.UNSAFE_PATH, err);
          return toolErrorFrom(ErrorCode.INTERNAL, err);
        }
      }
    
      const model =
        args.model ||
        process.env.OPENROUTER_DEFAULT_VIDEO_GEN_MODEL ||
        FALLBACK_MODEL;
    
      const body = buildRequestBody(args, model);
      try {
        await attachFrameImages(args, body);
      } catch (err) {
        // Sandbox violation → UNSAFE_PATH; all other decode failures stay
        // as UNSUPPORTED_FORMAT (couldn't read, invalid data URL, etc.).
        if (err instanceof UnsafeOutputPathError) {
          return toolErrorFrom(ErrorCode.UNSAFE_PATH, err, 'Reference/frame image');
        }
        return toolErrorFrom(ErrorCode.UNSUPPORTED_FORMAT, err, 'Reference/frame image');
      }
    
      let envelope: VideoJobEnvelope;
      try {
        logger.info('generate_video.submit', { model, keys: Object.keys(body) });
        envelope = await apiClient.submitVideoJob(body);
      } catch (err) {
        return classifyUpstreamError(err, 'generate_video.submit');
      }
    
      const pollIntervalMs = Math.max(
        MIN_POLL_INTERVAL_MS,
        args.poll_interval_ms ?? getDefaultPollInterval(),
      );
      const maxWaitMs = Math.max(100, args.max_wait_ms ?? getDefaultMaxWait());
      const deadlineAt = Date.now() + maxWaitMs;
    
      const outcome = await pollUntilTerminal(apiClient, envelope, {
        pollIntervalMs,
        deadlineAt,
        onProgress: progress,
      });
    
      if (outcome.kind === 'failed') {
        return toolError(ErrorCode.JOB_FAILED, extractJobError(outcome.status), {
          video_id: outcome.status.id,
        });
      }
      if (outcome.kind === 'timeout') {
        return {
          content: [
            {
              type: 'text' as const,
              text: `Video still generating after ${maxWaitMs}ms. Use get_video_status with video_id=${envelope.id} to resume.`,
            },
          ],
          isError: false as const,
          _meta: {
            code: ErrorCode.JOB_STILL_RUNNING,
            video_id: envelope.id,
            polling_url: envelope.polling_url ?? `https://openrouter.ai/api/v1/videos/${envelope.id}`,
            last_status: outcome.last?.status,
          },
        };
      }
    
      try {
        const { content, _meta } = await finalizeCompletedJob(
          apiClient,
          outcome.status,
          safeSavePath,
        );
        return { content, _meta };
      } catch (err) {
        if (err instanceof UnsafeOutputPathError) {
          return toolErrorFrom(ErrorCode.UNSAFE_PATH, err);
        }
        return toolErrorFrom(ErrorCode.UPSTREAM_HTTP, err, 'Download');
      }
    }
  • Input/request type definition for the generate_video tool, accepting prompt, model, resolution, aspect_ratio, duration, seed, first/last frame images, reference images, provider options, save_path, and polling parameters.
    export interface GenerateVideoToolRequest {
      prompt: string;
      model?: string;
      resolution?: string;
      aspect_ratio?: string;
      duration?: number;
      seed?: number;
      first_frame_image?: string;
      last_frame_image?: string;
      reference_images?: string[];
      provider?: Record<string, unknown>;
      save_path?: string;
      max_wait_ms?: number;
      poll_interval_ms?: number;
    }
  • Tool registration in ListToolsRequestSchema: defines the 'generate_video' tool name, description, annotations, and inputSchema with all parameters.
    {
      name: 'generate_video',
      description:
        'Generate a video from a text prompt using an OpenRouter video-generation model (default: google/veo-3.1). ' +
        'Submits an async job, polls until completion or max_wait_ms, then downloads the result. ' +
        'Optionally conditioned on first/last-frame images or reference images. ' +
        'Large outputs are auto-saved when save_path is provided and path-sandboxed.',
      annotations: {
        readOnlyHint: false,
        destructiveHint: false,
        idempotentHint: false,
      },
      inputSchema: {
        type: 'object',
        properties: {
          prompt: { type: 'string', description: 'Text description of the desired video.' },
          model: { type: 'string', description: 'Override the video model ID.' },
          resolution: {
            type: 'string',
            description: '480p / 720p / 1080p / 1K / 2K / 4K (model-dependent).',
          },
          aspect_ratio: {
            type: 'string',
            description: '16:9 / 9:16 / 1:1 / 4:3 / 3:4 / 21:9 / 9:21 (model-dependent).',
          },
          duration: {
            type: 'number',
            minimum: 1,
            description: 'Duration in seconds (model-dependent).',
          },
          seed: { type: 'number', description: 'Deterministic seed when supported.' },
          first_frame_image: {
            type: 'string',
            description:
              'Optional image (path, URL, or data URL) used as the first frame for image-to-video.',
          },
          last_frame_image: {
            type: 'string',
            description: 'Optional image used as the last frame for frame transitions.',
          },
          reference_images: {
            type: 'array',
            items: { type: 'string' },
            description: 'Optional style/content reference images.',
          },
          provider: {
            type: 'object',
            description: 'Provider-specific passthrough options keyed by provider slug.',
          },
          save_path: {
            type: 'string',
            description:
              'Where to save the video. Routed through the OPENROUTER_OUTPUT_DIR sandbox; extension auto-corrected.',
          },
          max_wait_ms: {
            type: 'number',
            minimum: 10000,
            description:
              'Total time to wait for the async job before returning a resumable handle (default 600000 ms).',
          },
          poll_interval_ms: {
            type: 'number',
            minimum: 2000,
            description: 'Polling cadence (default 15000 ms).',
          },
        },
        required: ['prompt'],
      },
  • Tool dispatch in CallToolRequestSchema: routes 'generate_video' requests to handleGenerateVideo with wrapped arguments and the API client.
    case 'generate_video':
      return handleGenerateVideo(
        wrapToolArgs(args as GenerateVideoToolRequest | undefined),
        this.apiClient,
      );
  • Helper function to prepare image inputs (file path, HTTP URL, or data URL) for frame_images and reference_images fields.
    async function prepareImageInput(
      source: string,
    ): Promise<{ data: string; mime: string } | null> {
      if (!source) return null;
      if (source.startsWith('data:')) {
        const match = source.match(/^data:([^;,]+)(?:;[^,]*)*;base64,(.+)$/);
        if (!match) throw new Error(`Invalid image data URL: ${source.slice(0, 40)}…`);
        return { mime: match[1]!, data: match[2]! };
      }
      if (source.startsWith('http://') || source.startsWith('https://')) {
        const { fetchHttpResource } = await import('./fetch-utils.js');
        const { buffer, contentType } = await fetchHttpResource(source, {
          timeoutMs: 30_000,
          maxBytes: 25 * 1024 * 1024,
          maxRedirects: 8,
        });
        const mime = (contentType?.split(';')[0]?.trim() || 'image/jpeg').toLowerCase();
        return { mime, data: buffer.toString('base64') };
      }
      // Local file: sandbox via path-safety's resolveSafeInputPath so
      // generate_video's first_frame_image / last_frame_image /
      // reference_images fields enforce the same OPENROUTER_INPUT_DIR
      // / OPENROUTER_OUTPUT_DIR / cwd scope that generate_image's
      // input_images already uses. Callers can still bypass with
      // OPENROUTER_ALLOW_UNSAFE_PATHS=1 for legacy scripts.
      const abs = await resolveSafeInputPath(source);
      const buf = await fs.readFile(abs);
      const ext = extname(abs).toLowerCase();
      const mime =
        ext === '.png'
          ? 'image/png'
          : ext === '.webp'
            ? 'image/webp'
            : ext === '.gif'
              ? 'image/gif'
              : 'image/jpeg';
      return { mime, data: buf.toString('base64') };
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (which are all false), the description discloses async job submission, polling, timeout handling, auto-saving with path sandboxing, and image conditioning. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of four sentences, efficiently covering the main action and key details. It could be slightly more structured with bullet points, but no information is wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, async behavior, optional images, sandboxing), the description covers the essential workflow: async submission, polling, auto-save, and sandbox. It does not explain return values or error handling details, but no output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaningful context: default model, model-dependent constraints on resolution/duration, and auto-corrected save path extension. This adds value beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates a video from a text prompt using an OpenRouter model, and it distinguishes itself from sibling tools like generate_audio and generate_image by specifying video generation with async polling and optional image conditioning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for video generation but does not explicitly state when to use this tool vs alternatives (e.g., get_video_status for status checks, analyze_video for analysis). No 'when not to use' guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stabgan/openrouter-mcp-multimodal'

If you have feedback or need assistance with the MCP directory API, please join our Discord server