Why this server?
Supports PDF to DOCX conversion which can then be parsed for text. Potentially can help with images.
-securityAlicense-qualityAn MCP server that provides multiple file conversion tools for AI agents, supporting various document and image format conversions including DOCX to PDF, PDF to DOCX, image conversions, Excel to CSV, HTML to PDF, and Markdown to PDF.Last updated25MITWhy this server?
Provides functionality to fetch and transform web content in various formats including plain text and Markdown, which can be useful after extracting text.
-securityFlicense-qualityProvides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls.Last updated98,5911Why this server?
Specifically converts PDF to Markdown, a structured text format, suggesting it can handle text extraction.
MITWhy this server?
Provides document processing capabilities, allowing conversion of documents to markdown and extraction of tables.
AsecurityAlicense-qualityA server that provides document processing capabilities using the Model Context Protocol, allowing conversion of documents to markdown, extraction of tables, and processing of document images.Last updated18MITWhy this server?
Extracts and transforms webpage content into clean, LLM-optimized Markdown, which helps with complex layouts.
AsecurityAlicense-qualityExtracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.Last updated13616MITWhy this server?
Can filter and analyze web content extracting insights.
-securityFlicense-qualityBridge the gap between your web crawl and AI language models. With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Supports WARC, wget, InterroBot, Katana, and SiteOne crawlers.Last updated38PythonWhy this server?
Can interact with PDF and EPUB documents. May help with text extraction.
-securityAlicense-qualityA Model Context Protocol (MCP) server that allows interaction with PDF and EPUB documents, designed to work with Windsurf IDE by Codeium.Last updated9MITWhy this server?
Retrieves web page content and converts it to Markdown, which is useful for extracting text.
Why this server?
Integrates with FireCrawl for web scraping, which can help get data from online PDFs.
AsecurityAlicense-qualityA Model Context Protocol (MCP) server implementation that integrates with FireCrawl for advanced web scraping capabilities.Last updated29,8556,059MIT