Skip to main content
Glama

send_media

Idempotent

Send images, videos, documents, and audio files to a WhatsApp contact or group using file paths, URLs, or base64 content. Supports optional captions and voice notes.

Instructions

Send media (image, video, document, audio) via WhatsApp.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
recipient_jidYesThe recipient JID (e.g., 123456789@s.whatsapp.net or 123456789-12345678@g.us)
media_pathNoAbsolute path to the local media file
media_urlNoURL of the media file
media_contentNoBase64 encoded media content
mime_typeNoMIME type of the media_content (required if using media_content)
filenameNoFilename for the media (recommended if using media_content)
captionNoOptional caption for the media
as_audio_messageNoSend audio specifically as a voice note (requires ffmpeg for conversion if not opus/ogg)
idempotency_keyNoOptional idempotency key. Repeating the same send_media request with the same key returns the original result instead of sending again.
include_full_dataNoWhether to include the full base64 data in the response

Implementation Reference

  • The main tool handler function for 'send_media'. Accepts recipient_jid, media_path, media_url, media_content, mime_type, filename, caption, as_audio_message, idempotency_key, and include_full_data. Determines input type (path/url/base64), handles audio conversion if as_audio_message is true, delegates to whatsappService.sendMedia or sendMediaFromBase64, and returns the result.
    async ({
      recipient_jid,
      media_path,
      media_url,
      media_content,
      mime_type,
      filename,
      caption,
      as_audio_message,
      idempotency_key,
      include_full_data = false,
    }): Promise<CallToolResult> => {
      let input: string | null = null;
      let inputType: "path" | "url" | "base64" | null = null;
    
      if (media_path) {
        input = media_path;
        inputType = "path";
      } else if (media_url) {
        input = media_url;
        inputType = "url";
      } else if (media_content) {
        if (!mime_type) {
          return {
            content: [
              {
                type: "text",
                text: "mime_type is required when using media_content",
              },
            ],
            isError: true,
          };
        }
        input = media_content;
        inputType = "base64";
      }
    
      if (!input || !inputType) {
        return {
          content: [
            {
              type: "text",
              text: "One of media_path, media_url, or media_content must be provided",
            },
          ],
          isError: true,
        };
      }
    
      try {
        const requestFingerprint = crypto
          .createHash("sha256")
          .update(
            JSON.stringify({
              recipient_jid,
              media_path: media_path || null,
              media_url: media_url || null,
              media_content: media_content || null,
              mime_type: mime_type || null,
              filename: filename || null,
              caption: caption || null,
              as_audio_message,
            }),
          )
          .digest("hex");
        let sentMessage: any;
        let finalMediaPath = media_path;
    
        if (as_audio_message) {
          let audioPath: string;
          let tempFilePath: string | null = null;
          let needsCleanup = false;
    
          if (inputType === "path") {
            audioPath = input;
          } else if (inputType === "url") {
            const resp = await axios.get(input, {
              responseType: "arraybuffer",
            });
            const buffer = Buffer.from(resp.data);
            const detected = await fileTypeFromBuffer(buffer);
            const ext = detected?.ext || "bin";
            tempFilePath = path.join(
              os.tmpdir(),
              `whatsapp_audio_${Date.now()}.${ext}`,
            );
            fs.writeFileSync(tempFilePath, buffer);
            audioPath = tempFilePath;
            needsCleanup = true;
          } else {
            const buffer = Buffer.from(input, "base64");
            const detected = await fileTypeFromBuffer(buffer);
            const ext = detected?.ext || "bin";
            tempFilePath = path.join(
              os.tmpdir(),
              `whatsapp_audio_${Date.now()}.${ext}`,
            );
            fs.writeFileSync(tempFilePath, buffer);
            audioPath = tempFilePath;
            needsCleanup = true;
          }
    
          if (!audioPath.endsWith(".ogg")) {
            const convertedPath =
              await AudioUtils.convertToOpusOggTemp(audioPath);
            if (needsCleanup && tempFilePath && fs.existsSync(tempFilePath)) {
              fs.unlinkSync(tempFilePath);
            }
            tempFilePath = convertedPath;
            needsCleanup = true;
            audioPath = convertedPath;
            finalMediaPath = audioPath;
          }
    
          sentMessage = await whatsappService.sendMedia(
            recipient_jid,
            audioPath,
            caption,
            true,
            {
              idempotencyKey: idempotency_key,
              requestFingerprint,
            },
          );
    
          if (needsCleanup && tempFilePath && fs.existsSync(tempFilePath)) {
            fs.unlinkSync(tempFilePath);
          }
        } else {
          if (inputType === "base64") {
            sentMessage = await whatsappService.sendMediaFromBase64(
              recipient_jid,
              input,
              mime_type!,
              filename,
              caption,
              false,
              {
                idempotencyKey: idempotency_key,
                requestFingerprint,
              },
            );
          } else {
            sentMessage = await whatsappService.sendMedia(
              recipient_jid,
              input,
              caption,
              false,
              {
                idempotencyKey: idempotency_key,
                requestFingerprint,
              },
            );
            if (inputType === "path") {
              finalMediaPath = input;
            }
          }
        }
    
        const messageId =
          sentMessage?.key?.remoteJid && sentMessage?.key?.id
            ? `${sentMessage.key.remoteJid}:${sentMessage.key.id}`
            : undefined;
    
        const result: any = {
          success: true,
          message: `Media (${as_audio_message ? "audio message" : "file"}) sent successfully.`,
          messageId: messageId || "unknown",
          timestamp: Number(sentMessage?.messageTimestamp || Date.now() / 1000),
          filePathUsed: finalMediaPath,
          deduplicated: Boolean(sentMessage?.__deduplicated),
          idempotencyKey: idempotency_key || null,
        };
    
        if (include_full_data && inputType === "base64") {
          result.mediaData = media_content;
          result.mimeType = mime_type;
        } else if (
          include_full_data &&
          inputType === "path" &&
          input &&
          fs.existsSync(input)
        ) {
          const buffer = fs.readFileSync(input);
          result.mediaData = buffer.toString("base64");
          const detectedType = await fileTypeFromBuffer(buffer);
          result.mimeType = detectedType?.mime || "application/octet-stream";
        }
    
        return {
          content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
        };
      } catch (error: any) {
        log.error(`Error in send_media tool to ${recipient_jid}:`, error);
        return {
          content: [
            {
              type: "text",
              text: `Error sending media to ${recipient_jid}: ${error.message}`,
            },
          ],
          isError: true,
        };
      }
    },
  • Zod schema defining the input parameters for the send_media tool: recipient_jid (required), media_path, media_url, media_content, mime_type, filename, caption, as_audio_message (default false), idempotency_key, and include_full_data (default false).
    {
      recipient_jid: z
        .string()
        .describe(
          "The recipient JID (e.g., 123456789@s.whatsapp.net or 123456789-12345678@g.us)",
        ),
      media_path: z
        .string()
        .optional()
        .describe("Absolute path to the local media file"),
      media_url: z.string().url().optional().describe("URL of the media file"),
      media_content: z
        .string()
        .optional()
        .describe("Base64 encoded media content"),
      mime_type: z
        .string()
        .optional()
        .describe(
          "MIME type of the media_content (required if using media_content)",
        ),
      filename: z
        .string()
        .optional()
        .describe(
          "Filename for the media (recommended if using media_content)",
        ),
      caption: z.string().optional().describe("Optional caption for the media"),
      as_audio_message: z
        .boolean()
        .optional()
        .default(false)
        .describe(
          "Send audio specifically as a voice note (requires ffmpeg for conversion if not opus/ogg)",
        ),
      idempotency_key: z
        .string()
        .min(1)
        .max(200)
        .optional()
        .describe(
          "Optional idempotency key. Repeating the same send_media request with the same key returns the original result instead of sending again.",
        ),
      include_full_data: z
        .boolean()
        .optional()
        .default(false)
        .describe("Whether to include the full base64 data in the response"),
    },
  • The registerMediaTools function registers the 'send_media' tool (and 'download_media') on the MCP server. Called from server.ts line 247.
    export function registerMediaTools(
      server: McpServer,
      whatsappService: WhatsAppService,
    ): void {
      log.info("Registering media tools...");
    
      server.tool(
        "send_media",
        "Send media (image, video, document, audio) via WhatsApp.",
        {
          recipient_jid: z
            .string()
            .describe(
              "The recipient JID (e.g., 123456789@s.whatsapp.net or 123456789-12345678@g.us)",
            ),
          media_path: z
            .string()
            .optional()
            .describe("Absolute path to the local media file"),
          media_url: z.string().url().optional().describe("URL of the media file"),
          media_content: z
            .string()
            .optional()
            .describe("Base64 encoded media content"),
          mime_type: z
            .string()
            .optional()
            .describe(
              "MIME type of the media_content (required if using media_content)",
            ),
          filename: z
            .string()
            .optional()
            .describe(
              "Filename for the media (recommended if using media_content)",
            ),
          caption: z.string().optional().describe("Optional caption for the media"),
          as_audio_message: z
            .boolean()
            .optional()
            .default(false)
            .describe(
              "Send audio specifically as a voice note (requires ffmpeg for conversion if not opus/ogg)",
            ),
          idempotency_key: z
            .string()
            .min(1)
            .max(200)
            .optional()
            .describe(
              "Optional idempotency key. Repeating the same send_media request with the same key returns the original result instead of sending again.",
            ),
          include_full_data: z
            .boolean()
            .optional()
            .default(false)
            .describe("Whether to include the full base64 data in the response"),
        },
        async ({
          recipient_jid,
          media_path,
          media_url,
          media_content,
          mime_type,
          filename,
          caption,
          as_audio_message,
          idempotency_key,
          include_full_data = false,
        }): Promise<CallToolResult> => {
          let input: string | null = null;
          let inputType: "path" | "url" | "base64" | null = null;
    
          if (media_path) {
            input = media_path;
            inputType = "path";
          } else if (media_url) {
            input = media_url;
            inputType = "url";
          } else if (media_content) {
            if (!mime_type) {
              return {
                content: [
                  {
                    type: "text",
                    text: "mime_type is required when using media_content",
                  },
                ],
                isError: true,
              };
            }
            input = media_content;
            inputType = "base64";
          }
    
          if (!input || !inputType) {
            return {
              content: [
                {
                  type: "text",
                  text: "One of media_path, media_url, or media_content must be provided",
                },
              ],
              isError: true,
            };
          }
    
          try {
            const requestFingerprint = crypto
              .createHash("sha256")
              .update(
                JSON.stringify({
                  recipient_jid,
                  media_path: media_path || null,
                  media_url: media_url || null,
                  media_content: media_content || null,
                  mime_type: mime_type || null,
                  filename: filename || null,
                  caption: caption || null,
                  as_audio_message,
                }),
              )
              .digest("hex");
            let sentMessage: any;
            let finalMediaPath = media_path;
    
            if (as_audio_message) {
              let audioPath: string;
              let tempFilePath: string | null = null;
              let needsCleanup = false;
    
              if (inputType === "path") {
                audioPath = input;
              } else if (inputType === "url") {
                const resp = await axios.get(input, {
                  responseType: "arraybuffer",
                });
                const buffer = Buffer.from(resp.data);
                const detected = await fileTypeFromBuffer(buffer);
                const ext = detected?.ext || "bin";
                tempFilePath = path.join(
                  os.tmpdir(),
                  `whatsapp_audio_${Date.now()}.${ext}`,
                );
                fs.writeFileSync(tempFilePath, buffer);
                audioPath = tempFilePath;
                needsCleanup = true;
              } else {
                const buffer = Buffer.from(input, "base64");
                const detected = await fileTypeFromBuffer(buffer);
                const ext = detected?.ext || "bin";
                tempFilePath = path.join(
                  os.tmpdir(),
                  `whatsapp_audio_${Date.now()}.${ext}`,
                );
                fs.writeFileSync(tempFilePath, buffer);
                audioPath = tempFilePath;
                needsCleanup = true;
              }
    
              if (!audioPath.endsWith(".ogg")) {
                const convertedPath =
                  await AudioUtils.convertToOpusOggTemp(audioPath);
                if (needsCleanup && tempFilePath && fs.existsSync(tempFilePath)) {
                  fs.unlinkSync(tempFilePath);
                }
                tempFilePath = convertedPath;
                needsCleanup = true;
                audioPath = convertedPath;
                finalMediaPath = audioPath;
              }
    
              sentMessage = await whatsappService.sendMedia(
                recipient_jid,
                audioPath,
                caption,
                true,
                {
                  idempotencyKey: idempotency_key,
                  requestFingerprint,
                },
              );
    
              if (needsCleanup && tempFilePath && fs.existsSync(tempFilePath)) {
                fs.unlinkSync(tempFilePath);
              }
            } else {
              if (inputType === "base64") {
                sentMessage = await whatsappService.sendMediaFromBase64(
                  recipient_jid,
                  input,
                  mime_type!,
                  filename,
                  caption,
                  false,
                  {
                    idempotencyKey: idempotency_key,
                    requestFingerprint,
                  },
                );
              } else {
                sentMessage = await whatsappService.sendMedia(
                  recipient_jid,
                  input,
                  caption,
                  false,
                  {
                    idempotencyKey: idempotency_key,
                    requestFingerprint,
                  },
                );
                if (inputType === "path") {
                  finalMediaPath = input;
                }
              }
            }
    
            const messageId =
              sentMessage?.key?.remoteJid && sentMessage?.key?.id
                ? `${sentMessage.key.remoteJid}:${sentMessage.key.id}`
                : undefined;
    
            const result: any = {
              success: true,
              message: `Media (${as_audio_message ? "audio message" : "file"}) sent successfully.`,
              messageId: messageId || "unknown",
              timestamp: Number(sentMessage?.messageTimestamp || Date.now() / 1000),
              filePathUsed: finalMediaPath,
              deduplicated: Boolean(sentMessage?.__deduplicated),
              idempotencyKey: idempotency_key || null,
            };
    
            if (include_full_data && inputType === "base64") {
              result.mediaData = media_content;
              result.mimeType = mime_type;
            } else if (
              include_full_data &&
              inputType === "path" &&
              input &&
              fs.existsSync(input)
            ) {
              const buffer = fs.readFileSync(input);
              result.mediaData = buffer.toString("base64");
              const detectedType = await fileTypeFromBuffer(buffer);
              result.mimeType = detectedType?.mime || "application/octet-stream";
            }
    
            return {
              content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
            };
          } catch (error: any) {
            log.error(`Error in send_media tool to ${recipient_jid}:`, error);
            return {
              content: [
                {
                  type: "text",
                  text: `Error sending media to ${recipient_jid}: ${error.message}`,
                },
              ],
              isError: true,
            };
          }
        },
      );
  • src/server.ts:77-82 (registration)
    Execution metadata hints for the 'send_media' tool: readOnlyHint: false, idempotentHint: true, destructiveHint: false, openWorldHint: true.
    send_media: {
      readOnlyHint: false,
      idempotentHint: true,
      destructiveHint: false,
      openWorldHint: true,
    },
  • The sendMedia method on WhatsAppService handles the actual sending of media files (from local path or URL) via the WhatsApp socket. It reads/downloads the file into a buffer, detects mime type, builds the media message, and supports idempotency.
    async sendMedia(
      jid: string,
      input: string,
      caption?: string,
      asAudioMessage = false,
      options?: { idempotencyKey?: string | null; requestFingerprint?: string },
    ): Promise<any> {
      let buffer: Buffer;
      let mimetype = "application/octet-stream";
      let filename: string | undefined;
    
      if (input.startsWith("http://") || input.startsWith("https://")) {
        const resp = await axios.get(input, { responseType: "arraybuffer" });
        buffer = Buffer.from(resp.data);
        const detected = await fileTypeFromBuffer(buffer);
        if (detected) {
          mimetype = detected.mime;
          filename = `file.${detected.ext}`;
        }
      } else {
        buffer = fs.readFileSync(input);
        const detected = await fileTypeFromBuffer(buffer);
        if (detected) {
          mimetype = detected.mime;
          filename = path.basename(input);
        }
      }
    
      const content = await this.buildMediaMessage(
        buffer,
        mimetype,
        filename,
        caption,
        asAudioMessage,
      );
      const normalized = this.resolveLookupJid(jid);
      const send = () => this.getSocket().sendMessage(normalized, content);
      if (options?.idempotencyKey) {
        return await this.executeIdempotentOperation(
          "send_media",
          options.requestFingerprint ||
            this.buildRequestFingerprint(
              normalized,
              JSON.stringify({
                input,
                caption: caption || null,
                asAudioMessage,
                mimetype,
                filename: filename || null,
              }),
            ),
          send,
          { idempotencyKey: options.idempotencyKey, scopeJid: normalized },
        );
      }
      return await send();
    }
    
    async sendMediaFromBase64(
      jid: string,
      base64: string,
      mimeType: string,
      filename?: string,
      caption?: string,
      asAudioMessage = false,
      options?: { idempotencyKey?: string | null; requestFingerprint?: string },
    ): Promise<any> {
      const buffer = Buffer.from(base64, "base64");
      const content = await this.buildMediaMessage(
        buffer,
        mimeType,
        filename,
        caption,
        asAudioMessage,
      );
      const normalized = this.resolveLookupJid(jid);
      const send = () => this.getSocket().sendMessage(normalized, content);
      if (options?.idempotencyKey) {
        return await this.executeIdempotentOperation(
          "send_media",
          options.requestFingerprint ||
            this.buildRequestFingerprint(
              normalized,
              JSON.stringify({
                base64,
                mimeType,
                filename: filename || null,
                caption: caption || null,
                asAudioMessage,
              }),
            ),
          send,
          { idempotencyKey: options.idempotencyKey, scopeJid: normalized },
        );
      }
      return await send();
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate a non-read-only, non-destructive, idempotent, open-world tool. The description adds no additional behavioral context beyond the basic purpose. It does not disclose prerequisites (e.g., ffmpeg for audio conversion), side effects, or response behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence with no redundant information. It is front-loaded with the key purpose and media types, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite high schema coverage, the description lacks an overview of how multiple media source parameters work (mutual exclusivity) and does not mention idempotency or full-data response options. Given 10 parameters and no output schema, more context is needed for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds no parameter semantics beyond the schema. It does not explain the relationship between media_path, media_url, and media_content (mutual exclusivity) or the requirement for mime_type with media_content.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Send media (image, video, document, audio) via WhatsApp.' It uses a specific verb and resource, listing the media types, and distinguishes itself from siblings like send_message (text) and download_media (download).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives like send_message or download_media. Usage is implied by naming media types, but no guidance on exclusions or when not to use is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/loglux/whatsapp-mcp-stream'

If you have feedback or need assistance with the MCP directory API, please join our Discord server