Unstructured Document Processor MCP

remote-capable server

The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.

Integrations

  • Supports processing JPEG/JPG files to extract content for large language models

  • Supports processing Org files to extract content for large language models

  • Supports processing SVG files to extract content for large language models

A Model Context Protocol server that provides unstructured document processing capabilities. This server enables LLMs to extract and use content from an unstructured document.

This repo is work in progress, proceed with caution :)

Supported file types:

{".abw", ".bmp", ".csv", ".cwk", ".dbf", ".dif", ".doc", ".docm", ".docx", ".dot", ".dotm", ".eml", ".epub", ".et", ".eth", ".fods", ".gif", ".heic", ".htm", ".html", ".hwp", ".jpeg", ".jpg", ".md", ".mcw", ".mw", ".odt", ".org", ".p7s", ".pages", ".pbd", ".pdf", ".png", ".pot", ".potm", ".ppt", ".pptm", ".pptx", ".prn", ".rst", ".rtf", ".sdp", ".sgl", ".svg", ".sxg", ".tiff", ".txt", ".tsv", ".uof", ".uos1", ".uos2", ".web", ".webp", ".wk2", ".xls", ".xlsb", ".xlsm", ".xlsx", ".xlw", ".xml", ".zabw"}

Prerequisites: You'll need:

  • Two S3 buckets, one will be used as a source of documents in a processing workflow, and the other one will be a destination since Unstructured Platform currently doesn't support local sources and destinations.
  • Your AWS auth credentials: access key and secret key.
  • Unstructured Platform API key.

Quick TLDR on how to add this MCP to your Claude Desktop:

  1. Clone the repo and set up the UV environment.
  2. Create a .env file in the root directory and add the following env variables: AWS_S3_SOURCE_BUCKET, AWS_S3_DESTINATION_BUCKET, AWS_KEY, AWS_SECRET, UNSTRUCTURED_API_KEY.
  3. Run the MCP server: uv run doc_processor.py
  4. Go to ~/Library/Application Support/Claude/ and create a claude_desktop_config.json. In that file add:
{ "mcpServers": { "unstructured_doc_processor": { "command": "PATH/TO/YOUR/UV", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR/unstructured-mcp/", "run", "doc_processor.py" ], "disabled": false } } }
  1. Restart Claude Desktop. You should now be able to use the MCP.
-
security - not tested
F
license - not found
-
quality - not tested

A Model Context Protocol server that enables LLMs to extract and use content from unstructured documents across a wide variety of file formats.