Search for:

High-Quality PDF Extraction to Text with Tokenization and Accurate Processing of Complex Layouts

  • Why this server?

    Supports PDF to DOCX conversion which can then be parsed for text. Potentially can help with images.

    -
    security
    A
    license
    -
    quality
    An MCP server that provides multiple file conversion tools for AI agents, supporting various document and image format conversions including DOCX to PDF, PDF to DOCX, image conversions, Excel to CSV, HTML to PDF, and Markdown to PDF.
    Last updated -
    3
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Provides functionality to fetch and transform web content in various formats including plain text and Markdown, which can be useful after extracting text.

    -
    security
    F
    license
    -
    quality
    Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls.
    Last updated -
    137,083
    TypeScript
  • Why this server?

    Specifically converts PDF to Markdown, a structured text format, suggesting it can handle text extraction.

    -
    security
    A
    license
    -
    quality
    PDF to Markdown conversion tool
    Last updated -
    1
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Provides document processing capabilities, allowing conversion of documents to markdown and extraction of tables.

    -
    security
    A
    license
    -
    quality
    A server that provides document processing capabilities using the Model Context Protocol, allowing conversion of documents to markdown, extraction of tables, and processing of document images.
    Last updated -
    6
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Extracts and transforms webpage content into clean, LLM-optimized Markdown, which helps with complex layouts.

    A
    security
    A
    license
    A
    quality
    Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
    Last updated -
    1
    4
    11
    MIT License
  • Why this server?

    Can filter and analyze web content extracting insights.

    -
    security
    F
    license
    -
    quality
    Bridge the gap between your web crawl and AI language models. With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Supports WARC, wget, InterroBot, Katana, and SiteOne crawlers.
    Last updated -
    Python
    • Apple
  • Why this server?

    Can interact with PDF and EPUB documents. May help with text extraction.

    -
    security
    -
    license
    -
    quality
    A Model Context Protocol (MCP) server that allows interaction with PDF and EPUB documents, designed to work with Windsurf IDE by Codeium.
    Last updated -
    3
    Python
    MIT License
  • Why this server?

    Retrieves web page content and converts it to Markdown, which is useful for extracting text.

    A
    security
    A
    license
    A
    quality
    An MCP server that retrieves web page content using Playwright headless browser, capable of extracting main content and converting to Markdown format.
    Last updated -
    2
    509
    647
    TypeScript
    MIT License
    • Apple
  • Why this server?

    Integrates with FireCrawl for web scraping, which can help get data from online PDFs.

    A
    security
    A
    license
    A
    quality
    A Model Context Protocol (MCP) server implementation that integrates with FireCrawl for advanced web scraping capabilities.
    Last updated -
    9
    15,275
    2,745
    JavaScript
    MIT License
    • Apple
    • Linux
  • Why this server?

    Converts SVG to PNG, can help with Images in PDF.

    -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that converts SVG code to PNG images, offering two conversion methods (CairoSVG and Inkscape) with support for custom working directories.
    Last updated -
    Python
    • Linux
    • Apple