convert_to_markdown
Transform DOCX files into Markdown format for simplified content structuring and compatibility with plain-text editors. Ideal for document conversion and streamlined text processing.
Instructions
Convert DOCX file to Markdown format
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Path to the .docx file |
Implementation Reference
- src/index.ts:322-401 (registration)Registration of the 'convert_to_markdown' tool using server.tool, including name, description, input schema, and handler function reference.// Tool to convert DOCX to Markdown server.tool( 'convert_to_markdown', 'Convert DOCX file to Markdown format', { file_path: z.string().describe('Path to the .docx file'), }, async ({ file_path }) => { try { const absolutePath = path.resolve(file_path) if (!fs.existsSync(absolutePath)) { throw new Error(`File not found: ${absolutePath}`) } // Convert to HTML first const htmlResult = await mammoth.convertToHtml({ path: absolutePath }) let html = htmlResult.value // Simple HTML to Markdown conversion let markdown = html // Headers .replace(/<h1[^>]*>(.*?)<\/h1>/gi, '# $1\n\n') .replace(/<h2[^>]*>(.*?)<\/h2>/gi, '## $1\n\n') .replace(/<h3[^>]*>(.*?)<\/h3>/gi, '### $1\n\n') .replace(/<h4[^>]*>(.*?)<\/h4>/gi, '#### $1\n\n') .replace(/<h5[^>]*>(.*?)<\/h5>/gi, '##### $1\n\n') .replace(/<h6[^>]*>(.*?)<\/h6>/gi, '###### $1\n\n') // Bold and italic .replace(/<strong[^>]*>(.*?)<\/strong>/gi, '**$1**') .replace(/<b[^>]*>(.*?)<\/b>/gi, '**$1**') .replace(/<em[^>]*>(.*?)<\/em>/gi, '*$1*') .replace(/<i[^>]*>(.*?)<\/i>/gi, '*$1*') // Lists .replace(/<ul[^>]*>/gi, '') .replace(/<\/ul>/gi, '\n') .replace(/<ol[^>]*>/gi, '') .replace(/<\/ol>/gi, '\n') .replace(/<li[^>]*>(.*?)<\/li>/gi, '- $1\n') // Paragraphs .replace(/<p[^>]*>(.*?)<\/p>/gi, '$1\n\n') // Line breaks .replace(/<br[^>]*>/gi, '\n') // Remove remaining HTML tags .replace(/<[^>]*>/g, '') // Clean up extra whitespace .replace(/\n{3,}/g, '\n\n') .trim() return { content: [ { type: 'text', text: JSON.stringify( { markdown: markdown, word_count: markdown .split(/\s+/) .filter((word: string) => word.length > 0).length, messages: htmlResult.messages, }, null, 2 ), }, ], } } catch (error) { return { content: [ { type: 'text', text: `Error converting to Markdown: ${(error as Error).message}`, }, ], isError: true, } } } )
- src/index.ts:326-328 (schema)Input schema for the tool using Zod: requires 'file_path' as string.{ file_path: z.string().describe('Path to the .docx file'), },
- src/index.ts:329-400 (handler)The handler function implements the conversion logic: resolves file path, converts DOCX to HTML using mammoth.convertToHtml, applies a series of regex replacements to transform HTML elements to Markdown syntax (headers, bold, italic, lists, paragraphs, etc.), cleans up whitespace, computes word count, and returns structured response or error.async ({ file_path }) => { try { const absolutePath = path.resolve(file_path) if (!fs.existsSync(absolutePath)) { throw new Error(`File not found: ${absolutePath}`) } // Convert to HTML first const htmlResult = await mammoth.convertToHtml({ path: absolutePath }) let html = htmlResult.value // Simple HTML to Markdown conversion let markdown = html // Headers .replace(/<h1[^>]*>(.*?)<\/h1>/gi, '# $1\n\n') .replace(/<h2[^>]*>(.*?)<\/h2>/gi, '## $1\n\n') .replace(/<h3[^>]*>(.*?)<\/h3>/gi, '### $1\n\n') .replace(/<h4[^>]*>(.*?)<\/h4>/gi, '#### $1\n\n') .replace(/<h5[^>]*>(.*?)<\/h5>/gi, '##### $1\n\n') .replace(/<h6[^>]*>(.*?)<\/h6>/gi, '###### $1\n\n') // Bold and italic .replace(/<strong[^>]*>(.*?)<\/strong>/gi, '**$1**') .replace(/<b[^>]*>(.*?)<\/b>/gi, '**$1**') .replace(/<em[^>]*>(.*?)<\/em>/gi, '*$1*') .replace(/<i[^>]*>(.*?)<\/i>/gi, '*$1*') // Lists .replace(/<ul[^>]*>/gi, '') .replace(/<\/ul>/gi, '\n') .replace(/<ol[^>]*>/gi, '') .replace(/<\/ol>/gi, '\n') .replace(/<li[^>]*>(.*?)<\/li>/gi, '- $1\n') // Paragraphs .replace(/<p[^>]*>(.*?)<\/p>/gi, '$1\n\n') // Line breaks .replace(/<br[^>]*>/gi, '\n') // Remove remaining HTML tags .replace(/<[^>]*>/g, '') // Clean up extra whitespace .replace(/\n{3,}/g, '\n\n') .trim() return { content: [ { type: 'text', text: JSON.stringify( { markdown: markdown, word_count: markdown .split(/\s+/) .filter((word: string) => word.length > 0).length, messages: htmlResult.messages, }, null, 2 ), }, ], } } catch (error) { return { content: [ { type: 'text', text: `Error converting to Markdown: ${(error as Error).message}`, }, ], isError: true, } } }