Unstructured Document Processor MCP

by MKhalusova

Integrations

  • Supports processing JPEG/JPG files to extract content for large language models

  • Supports processing Org files to extract content for large language models

  • Supports processing SVG files to extract content for large language models

A Model Context Protocol server that provides unstructured document processing capabilities. This server enables LLMs to extract and use content from an unstructured document.

This repo is work in progress, proceed with caution :)

Supported file types:

{".abw", ".bmp", ".csv", ".cwk", ".dbf", ".dif", ".doc", ".docm", ".docx", ".dot", ".dotm", ".eml", ".epub", ".et", ".eth", ".fods", ".gif", ".heic", ".htm", ".html", ".hwp", ".jpeg", ".jpg", ".md", ".mcw", ".mw", ".odt", ".org", ".p7s", ".pages", ".pbd", ".pdf", ".png", ".pot", ".potm", ".ppt", ".pptm", ".pptx", ".prn", ".rst", ".rtf", ".sdp", ".sgl", ".svg", ".sxg", ".tiff", ".txt", ".tsv", ".uof", ".uos1", ".uos2", ".web", ".webp", ".wk2", ".xls", ".xlsb", ".xlsm", ".xlsx", ".xlw", ".xml", ".zabw"}

Prerequisites: You'll need:

Quick TLDR on how to add this MCP to your Claude Desktop:

  1. Clone the repo and set up the UV environment.
  2. Create a .env file in the root directory and add the following env variable: UNSTRUCTURED_API_KEY.
  3. Run the MCP server: uv run doc_processor.py
  4. Go to ~/Library/Application Support/Claude/ and create a claude_desktop_config.json. In that file add:
{ "mcpServers": { "unstructured_doc_processor": { "command": "PATH/TO/YOUR/UV", "args": [ "--directory", "ABSOLUTE/PATH/TO/YOUR/unstructured-mcp/", "run", "doc_processor.py" ], "disabled": false } } }
  1. Restart Claude Desktop. You should now be able to use the MCP.
-
security - not tested
F
license - not found
-
quality - not tested

local-only server

The server can only run on the client's local machine because it depends on local resources.

A Model Context Protocol server that enables LLMs to extract and use content from unstructured documents across a wide variety of file formats.

Related MCP Servers

  • A
    security
    A
    license
    A
    quality
    A Model Context Protocol server that provides tools for analyzing text documents, including counting words and characters. This server helps LLMs perform text analysis tasks by exposing simple document statistics functionality.
    Last updated -
    1
    8
    7
    JavaScript
    Apache 2.0
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that enables LLMs to read, search, and analyze code files with advanced caching and real-time file watching capabilities.
    Last updated -
    45
    8
    JavaScript
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol server that allows LLMs to interact with web content through standardized tools, currently supporting web scraping functionality.
    Last updated -
    Python
    MIT License
    • Linux
    • Apple
  • -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that enables LLMs to fetch and process web content in multiple formats (HTML, JSON, Markdown, text) with automatic format detection.
    Last updated -
    TypeScript
    • Apple

View all related MCP servers

ID: 6ncg8w9ovd