Skip to main content
Glama
zeph-gh

DOCX MCP Server

by zeph-gh

convert_to_markdown

Convert DOCX files to Markdown format to simplify document formatting and enable compatibility with markdown-supported platforms.

Instructions

Convert DOCX file to Markdown format

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to the .docx file

Implementation Reference

  • The core handler function that performs the DOCX to Markdown conversion using mammoth for HTML intermediate step and regex-based HTML to Markdown transformation.
    async ({ file_path }) => {
      try {
        const absolutePath = path.resolve(file_path)
    
        if (!fs.existsSync(absolutePath)) {
          throw new Error(`File not found: ${absolutePath}`)
        }
    
        // Convert to HTML first
        const htmlResult = await mammoth.convertToHtml({ path: absolutePath })
        let html = htmlResult.value
    
        // Simple HTML to Markdown conversion
        let markdown = html
          // Headers
          .replace(/<h1[^>]*>(.*?)<\/h1>/gi, '# $1\n\n')
          .replace(/<h2[^>]*>(.*?)<\/h2>/gi, '## $1\n\n')
          .replace(/<h3[^>]*>(.*?)<\/h3>/gi, '### $1\n\n')
          .replace(/<h4[^>]*>(.*?)<\/h4>/gi, '#### $1\n\n')
          .replace(/<h5[^>]*>(.*?)<\/h5>/gi, '##### $1\n\n')
          .replace(/<h6[^>]*>(.*?)<\/h6>/gi, '###### $1\n\n')
          // Bold and italic
          .replace(/<strong[^>]*>(.*?)<\/strong>/gi, '**$1**')
          .replace(/<b[^>]*>(.*?)<\/b>/gi, '**$1**')
          .replace(/<em[^>]*>(.*?)<\/em>/gi, '*$1*')
          .replace(/<i[^>]*>(.*?)<\/i>/gi, '*$1*')
          // Lists
          .replace(/<ul[^>]*>/gi, '')
          .replace(/<\/ul>/gi, '\n')
          .replace(/<ol[^>]*>/gi, '')
          .replace(/<\/ol>/gi, '\n')
          .replace(/<li[^>]*>(.*?)<\/li>/gi, '- $1\n')
          // Paragraphs
          .replace(/<p[^>]*>(.*?)<\/p>/gi, '$1\n\n')
          // Line breaks
          .replace(/<br[^>]*>/gi, '\n')
          // Remove remaining HTML tags
          .replace(/<[^>]*>/g, '')
          // Clean up extra whitespace
          .replace(/\n{3,}/g, '\n\n')
          .trim()
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(
                {
                  markdown: markdown,
                  word_count: markdown
                    .split(/\s+/)
                    .filter((word: string) => word.length > 0).length,
                  messages: htmlResult.messages,
                },
                null,
                2
              ),
            },
          ],
        }
      } catch (error) {
        return {
          content: [
            {
              type: 'text',
              text: `Error converting to Markdown: ${(error as Error).message}`,
            },
          ],
          isError: true,
        }
      }
    }
  • src/index.ts:323-401 (registration)
    Registers the 'convert_to_markdown' tool with the MCP server, including description, input schema, and inline handler function.
    server.tool(
      'convert_to_markdown',
      'Convert DOCX file to Markdown format',
      {
        file_path: z.string().describe('Path to the .docx file'),
      },
      async ({ file_path }) => {
        try {
          const absolutePath = path.resolve(file_path)
    
          if (!fs.existsSync(absolutePath)) {
            throw new Error(`File not found: ${absolutePath}`)
          }
    
          // Convert to HTML first
          const htmlResult = await mammoth.convertToHtml({ path: absolutePath })
          let html = htmlResult.value
    
          // Simple HTML to Markdown conversion
          let markdown = html
            // Headers
            .replace(/<h1[^>]*>(.*?)<\/h1>/gi, '# $1\n\n')
            .replace(/<h2[^>]*>(.*?)<\/h2>/gi, '## $1\n\n')
            .replace(/<h3[^>]*>(.*?)<\/h3>/gi, '### $1\n\n')
            .replace(/<h4[^>]*>(.*?)<\/h4>/gi, '#### $1\n\n')
            .replace(/<h5[^>]*>(.*?)<\/h5>/gi, '##### $1\n\n')
            .replace(/<h6[^>]*>(.*?)<\/h6>/gi, '###### $1\n\n')
            // Bold and italic
            .replace(/<strong[^>]*>(.*?)<\/strong>/gi, '**$1**')
            .replace(/<b[^>]*>(.*?)<\/b>/gi, '**$1**')
            .replace(/<em[^>]*>(.*?)<\/em>/gi, '*$1*')
            .replace(/<i[^>]*>(.*?)<\/i>/gi, '*$1*')
            // Lists
            .replace(/<ul[^>]*>/gi, '')
            .replace(/<\/ul>/gi, '\n')
            .replace(/<ol[^>]*>/gi, '')
            .replace(/<\/ol>/gi, '\n')
            .replace(/<li[^>]*>(.*?)<\/li>/gi, '- $1\n')
            // Paragraphs
            .replace(/<p[^>]*>(.*?)<\/p>/gi, '$1\n\n')
            // Line breaks
            .replace(/<br[^>]*>/gi, '\n')
            // Remove remaining HTML tags
            .replace(/<[^>]*>/g, '')
            // Clean up extra whitespace
            .replace(/\n{3,}/g, '\n\n')
            .trim()
    
          return {
            content: [
              {
                type: 'text',
                text: JSON.stringify(
                  {
                    markdown: markdown,
                    word_count: markdown
                      .split(/\s+/)
                      .filter((word: string) => word.length > 0).length,
                    messages: htmlResult.messages,
                  },
                  null,
                  2
                ),
              },
            ],
          }
        } catch (error) {
          return {
            content: [
              {
                type: 'text',
                text: `Error converting to Markdown: ${(error as Error).message}`,
              },
            ],
            isError: true,
          }
        }
      }
    )
  • Zod schema defining the input parameter 'file_path' for the tool.
    {
      file_path: z.string().describe('Path to the .docx file'),
    },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zeph-gh/Docx-Mcp-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server