Skip to main content
Glama
DumplingAI

Dumpling AI MCP Server

Official
by DumplingAI

extract-video

Extract structured data from videos using prompts to analyze content, convert video information into organized formats, and process video inputs via URLs or base64 encoding.

Instructions

Extract structured data from videos based on a prompt.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
inputMethodYesInput method
videoYesURL or base64-encoded video
promptYesExtraction prompt
jsonModeNoReturn in JSON format

Implementation Reference

  • The handler function for the 'extract-video' tool. It proxies the request to an external API endpoint at `${NWS_API_BASE}/api/v1/extract-video`, passing inputMethod, video URL/base64, prompt, and jsonMode. Returns the API response as text content.
    async ({ inputMethod, video, prompt, jsonMode }) => {
      const apiKey = process.env.DUMPLING_API_KEY;
      if (!apiKey) throw new Error("DUMPLING_API_KEY not set");
      const response = await fetch(`${NWS_API_BASE}/api/v1/extract-video`, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Authorization: `Bearer ${apiKey}`,
        },
        body: JSON.stringify({
          inputMethod,
          video,
          prompt,
          jsonMode,
          requestSource: "mcp",
        }),
      });
      if (!response.ok)
        throw new Error(`Failed: ${response.status} ${await response.text()}`);
      const data = await response.json();
      return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
    }
  • Input schema validation for the 'extract-video' tool using Zod schemas for parameters: inputMethod (url/base64), video (string), prompt (string), jsonMode (optional boolean).
    {
      inputMethod: z.enum(["url", "base64"]).describe("Input method"),
      video: z.string().describe("URL or base64-encoded video"),
      prompt: z.string().describe("Extraction prompt"),
      jsonMode: z.boolean().optional().describe("Return in JSON format"),
    },
  • src/index.ts:760-791 (registration)
    Registration of the 'extract-video' tool on the MCP server using server.tool(), including name, description, input schema, and inline handler function.
    server.tool(
      "extract-video",
      "Extract structured data from videos based on a prompt.",
      {
        inputMethod: z.enum(["url", "base64"]).describe("Input method"),
        video: z.string().describe("URL or base64-encoded video"),
        prompt: z.string().describe("Extraction prompt"),
        jsonMode: z.boolean().optional().describe("Return in JSON format"),
      },
      async ({ inputMethod, video, prompt, jsonMode }) => {
        const apiKey = process.env.DUMPLING_API_KEY;
        if (!apiKey) throw new Error("DUMPLING_API_KEY not set");
        const response = await fetch(`${NWS_API_BASE}/api/v1/extract-video`, {
          method: "POST",
          headers: {
            "Content-Type": "application/json",
            Authorization: `Bearer ${apiKey}`,
          },
          body: JSON.stringify({
            inputMethod,
            video,
            prompt,
            jsonMode,
            requestSource: "mcp",
          }),
        });
        if (!response.ok)
          throw new Error(`Failed: ${response.status} ${await response.text()}`);
        const data = await response.json();
        return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
      }
    );
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'extract structured data' but doesn't disclose behavioral traits such as processing time, rate limits, authentication needs, error handling, or what 'structured data' entails (e.g., JSON, text). This leaves gaps in understanding how the tool behaves beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part ('extract structured data', 'from videos', 'based on a prompt') contributes directly to understanding the tool's function, making it appropriately sized and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of video data extraction with no annotations and no output schema, the description is incomplete. It doesn't address key contextual aspects like output format (beyond 'structured data'), limitations (e.g., video length, supported codecs), or integration with sibling tools. For a tool with 4 parameters and no structured behavioral hints, more detail is needed to be fully helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds no additional meaning beyond implying that 'prompt' guides the extraction, but doesn't clarify prompt format, examples, or constraints. With high schema coverage, the baseline is 3, and the description doesn't significantly enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('extract structured data') and resource ('from videos'), specifying the mechanism ('based on a prompt'). It distinguishes from siblings like 'extract-audio' or 'extract-document' by focusing on video content, but doesn't explicitly contrast with 'extract' or 'extract-image' which might overlap in data extraction contexts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., video format support), exclusions (e.g., not for real-time processing), or direct comparisons to sibling tools like 'extract' or 'extract-image' that might handle similar tasks. The description only states what it does, not when it's appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DumplingAI/mcp-server-dumplingai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server