Search for:

High-Quality PDF Extraction to Text with Tokenization and Accurate Processing of Complex Layouts

  • Why this server?

    Supports PDF to DOCX conversion which can then be parsed for text. Potentially can help with images.

    -
    security
    A
    license
    -
    quality
    An MCP server that provides multiple file conversion tools for AI agents, supporting various document and image format conversions including DOCX to PDF, PDF to DOCX, image conversions, Excel to CSV, HTML to PDF, and Markdown to PDF.
    3
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Provides functionality to fetch and transform web content in various formats including plain text and Markdown, which can be useful after extracting text.

    -
    security
    F
    license
    -
    quality
    Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls.
    137,083
    TypeScript
  • Why this server?

    Specifically converts PDF to Markdown, a structured text format, suggesting it can handle text extraction.

    -
    security
    A
    license
    -
    quality
    PDF to Markdown conversion tool
    1
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Provides document processing capabilities, allowing conversion of documents to markdown and extraction of tables.

    -
    security
    A
    license
    -
    quality
    A server that provides document processing capabilities using the Model Context Protocol, allowing conversion of documents to markdown, extraction of tables, and processing of document images.
    6
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Extracts and transforms webpage content into clean, LLM-optimized Markdown, which helps with complex layouts.

    A
    security
    A
    license
    A
    quality
    Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
    1
    4
    11
    MIT License
  • Why this server?

    Can filter and analyze web content extracting insights.

    -
    security
    F
    license
    -
    quality
    Bridge the gap between your web crawl and AI language models. With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Supports WARC, wget, InterroBot, Katana, and SiteOne crawlers.
    Python
    • Apple
  • Why this server?

    Retrieves web page content and converts it to Markdown, which is useful for extracting text.

    A
    security
    A
    license
    A
    quality
    An MCP server that retrieves web page content using Playwright headless browser, capable of extracting main content and converting to Markdown format.
    2
    765
    555
    TypeScript
    MIT License
    • Apple
  • Why this server?

    Integrates with FireCrawl for web scraping, which can help get data from online PDFs.

    A
    security
    A
    license
    A
    quality
    A Model Context Protocol (MCP) server implementation that integrates with FireCrawl for advanced web scraping capabilities.
    9
    8,264
    2,147
    JavaScript
    MIT License
    • Apple
    • Linux
  • Why this server?

    Converts SVG to PNG, can help with Images in PDF.

    -
    security
    F
    license
    -
    quality
    A Model Context Protocol server that converts SVG code to PNG images, offering two conversion methods (CairoSVG and Inkscape) with support for custom working directories.
    Python
    • Linux
    • Apple