Convert DOCX to Markdown
docx_to_markdownConvert DOCX files to Markdown using Pandoc or MarkItDown, with image extraction and LLM-ready cleaning options.
Instructions
Convert a DOCX file to Markdown format using Pandoc or MarkItDown. Arguments:
inputPath (string, required): Path to the input DOCX file
outputPath (string, optional): Output path. Defaults to same name with .md
extractImages (boolean, optional): Extract embedded images. Defaults to false
imageDir (string, optional): Directory to store extracted images
engine (enum, optional): Conversion engine — 'pandoc' or 'markitdown'. Defaults to 'pandoc'
markdownFlavor (enum, optional): Markdown dialect for Pandoc output — 'gfm', 'commonmark', or 'pandoc'. Defaults to 'gfm'
cleanForLLM (boolean, optional): Clean up the Markdown for LLM consumption. Defaults to false
overwrite (boolean, optional): Allow overwriting. Defaults to false
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| inputPath | Yes | Path to the input DOCX file (relative to workspace) | |
| outputPath | No | Output Markdown path (relative to workspace). Auto-derived if omitted. | |
| extractImages | No | Extract embedded images from the DOCX | |
| imageDir | No | Directory to store extracted images (relative to workspace) | |
| engine | No | Conversion engine to use | |
| markdownFlavor | No | Markdown dialect for Pandoc output. Defaults to 'gfm'. | |
| cleanForLLM | No | Clean up the Markdown output for LLM consumption | |
| overwrite | No | Allow overwriting existing output file. Defaults to false. |