Skip to main content
Glama
zeph-gh

DOCX MCP Server

by zeph-gh

extract_images

Extract and save images from DOCX files to process document content efficiently.

Instructions

Extract and list images from a DOCX file

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to the .docx file
output_dirNoDirectory to save extracted images (optional)

Implementation Reference

  • The handler function implements the core logic for extracting images from DOCX files using the mammoth library. It converts the DOCX to HTML with custom image handling to either save images to a specified directory or embed them as base64 data URLs. It then parses the HTML to list all images with their sources and alt text.
    async ({ file_path, output_dir }) => {
      try {
        const absolutePath = path.resolve(file_path)
    
        if (!fs.existsSync(absolutePath)) {
          throw new Error(`File not found: ${absolutePath}`)
        }
    
        const options = {
          convertImage: mammoth.images.imgElement(function (image: any) {
            if (output_dir) {
              const outputPath = path.resolve(output_dir)
              if (!fs.existsSync(outputPath)) {
                fs.mkdirSync(outputPath, { recursive: true })
              }
    
              const imagePath = path.join(
                outputPath,
                `image_${Date.now()}_${Math.random().toString(36).substr(2, 9)}.${
                  image.contentType.split('/')[1]
                }`
              )
    
              return image.read().then(function (imageBuffer: Buffer) {
                fs.writeFileSync(imagePath, imageBuffer)
                return {
                  src: imagePath,
                  alt: image.altText || 'Extracted image',
                }
              })
            } else {
              return image.read().then(function (imageBuffer: Buffer) {
                return {
                  src: `data:${image.contentType};base64,${imageBuffer.toString(
                    'base64'
                  )}`,
                  alt: image.altText || 'Embedded image',
                  size: imageBuffer.length,
                }
              })
            }
          }),
        }
    
        const result = await mammoth.convertToHtml(
          { path: absolutePath },
          options
        )
        const images = (result.value.match(/<img[^>]*>/gi) || []).map(
          (img: string) => {
            const srcMatch = img.match(/src="([^"]*)"/)
            const altMatch = img.match(/alt="([^"]*)"/)
            return {
              src: srcMatch ? srcMatch[1] : '',
              alt: altMatch ? altMatch[1] : '',
              is_base64: srcMatch ? srcMatch[1].startsWith('data:') : false,
            }
          }
        )
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(
                {
                  total_images: images.length,
                  images: images,
                  output_directory: output_dir || 'Images embedded as base64',
                  messages: result.messages,
                },
                null,
                2
              ),
            },
          ],
        }
      } catch (error) {
        return {
          content: [
            {
              type: 'text',
              text: `Error extracting images: ${(error as Error).message}`,
            },
          ],
          isError: true,
        }
      }
    }
  • Zod schema defining the input parameters for the extract_images tool: required file_path and optional output_dir.
    {
      file_path: z.string().describe('Path to the .docx file'),
      output_dir: z
        .string()
        .optional()
        .describe('Directory to save extracted images (optional)'),
    },
  • src/index.ts:221-320 (registration)
    The registration of the extract_images tool using McpServer's server.tool method, specifying name, description, input schema, and handler function.
    server.tool(
      'extract_images',
      'Extract and list images from a DOCX file',
      {
        file_path: z.string().describe('Path to the .docx file'),
        output_dir: z
          .string()
          .optional()
          .describe('Directory to save extracted images (optional)'),
      },
      async ({ file_path, output_dir }) => {
        try {
          const absolutePath = path.resolve(file_path)
    
          if (!fs.existsSync(absolutePath)) {
            throw new Error(`File not found: ${absolutePath}`)
          }
    
          const options = {
            convertImage: mammoth.images.imgElement(function (image: any) {
              if (output_dir) {
                const outputPath = path.resolve(output_dir)
                if (!fs.existsSync(outputPath)) {
                  fs.mkdirSync(outputPath, { recursive: true })
                }
    
                const imagePath = path.join(
                  outputPath,
                  `image_${Date.now()}_${Math.random().toString(36).substr(2, 9)}.${
                    image.contentType.split('/')[1]
                  }`
                )
    
                return image.read().then(function (imageBuffer: Buffer) {
                  fs.writeFileSync(imagePath, imageBuffer)
                  return {
                    src: imagePath,
                    alt: image.altText || 'Extracted image',
                  }
                })
              } else {
                return image.read().then(function (imageBuffer: Buffer) {
                  return {
                    src: `data:${image.contentType};base64,${imageBuffer.toString(
                      'base64'
                    )}`,
                    alt: image.altText || 'Embedded image',
                    size: imageBuffer.length,
                  }
                })
              }
            }),
          }
    
          const result = await mammoth.convertToHtml(
            { path: absolutePath },
            options
          )
          const images = (result.value.match(/<img[^>]*>/gi) || []).map(
            (img: string) => {
              const srcMatch = img.match(/src="([^"]*)"/)
              const altMatch = img.match(/alt="([^"]*)"/)
              return {
                src: srcMatch ? srcMatch[1] : '',
                alt: altMatch ? altMatch[1] : '',
                is_base64: srcMatch ? srcMatch[1].startsWith('data:') : false,
              }
            }
          )
    
          return {
            content: [
              {
                type: 'text',
                text: JSON.stringify(
                  {
                    total_images: images.length,
                    images: images,
                    output_directory: output_dir || 'Images embedded as base64',
                    messages: result.messages,
                  },
                  null,
                  2
                ),
              },
            ],
          }
        } catch (error) {
          return {
            content: [
              {
                type: 'text',
                text: `Error extracting images: ${(error as Error).message}`,
              },
            ],
            isError: true,
          }
        }
      }
    )
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions extraction and listing but doesn't specify whether images are saved, displayed, or returned in a particular format, nor does it address permissions, rate limits, or error handling for the file operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's function without any unnecessary words. It is appropriately sized and front-loaded, making it easy to understand at a glance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (file processing with two parameters) and lack of annotations or output schema, the description is minimally adequate. It covers the basic purpose but lacks details on behavior, output format, and usage context, leaving gaps for an AI agent to infer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with clear descriptions for both parameters in the input schema. The description adds no additional meaning beyond what the schema provides, such as explaining how 'output_dir' affects the extraction process, so it meets the baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('extract and list') and resource ('images from a DOCX file'), making the purpose immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'extract_text' or 'convert_to_html' which might also handle DOCX files, so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'extract_text' or 'convert_to_html' that might also process DOCX files. It states what the tool does but offers no context about appropriate use cases or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zeph-gh/Docx-Mcp-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server