114,883 tools. Last updated 2026-04-22 05:04
"A tool for document extraction using MinerU" matching MCP tools:
- Convert PDF, Office documents, images, and web pages to Markdown format using MinerU cloud API for content extraction and analysis.Apache 2.0
- Extract text from images for document processing, receipt scanning, and image text extraction using OCR technology. Supports both URLs and base64 encoded images.
- Retrieves custom property names for a document by identifying its class and filtering system properties, enabling targeted metadata extraction and search operations.Apache 2.0
- Retrieve custom property names for a document by identifying its class and filtering out system properties. Use to discover available properties for targeted extraction or search operations.Apache 2.0
- Analyzes documents to automatically create JSON schemas for structured data extraction, enabling consistent field definitions across similar documents.
- Retrieve and display the full content of a specific document using the document ID. Use this tool to review, quote, reference, or analyze detailed document information within the MCP Outline Server.MIT
Matching MCP Servers
- AsecurityFlicenseAqualityEnables document parsing and extraction from PDFs and other formats using the MinerU API. Supports batch processing, page range selection, OCR in 109 languages, and VLM/pipeline models for high-accuracy content extraction.Last updated4413
- AsecurityAlicenseAqualityEnables parsing and extraction of content from various document formats (PDF, Word, Excel, PowerPoint) into Markdown format using the Niutrans document API.Last updated15MIT
Matching MCP Connectors
Manage your Canvas coursework with quick access to courses, assignments, and grades. Track upcomin…
The verified hub for conferences and journals. Powered by AI to match your scholarly ambitions with the world's most prestigious academic opportunities.
- Search document content, titles, and metadata in your Paperless-NGX instance using a full-text query for precise and comprehensive results.TypeScriptISC
- Add documents to a collection by providing a URL for download, processing them for text extraction, and indexing them for semantic search.MIT
- Convert web pages to structured Markdown while preserving tables, lists, and document hierarchy for clean content extraction.MIT
- Lists supported languages for OCR text extraction in document conversion. Use this tool to identify available language options before processing documents with MinerU MCP Server.
- Process documents from file paths to extract and structure information using AI pipelines. This tool handles file processing for automated data extraction and organization.MIT
- Analyze PDF files to classify document type and audit page quality before extraction. Provides page count, type detection, quality breakdown, and extraction difficulty assessment for initial file evaluation.MIT
- Extract PDF metadata including page count, file size, document type, and table presence to determine appropriate processing tools.MIT
- Retrieve supported OCR language codes to configure text extraction from scanned or multilingual documents before processing with parse_documents.Apache 2.0
- Retrieve document properties from IBM FileNet Content Manager using document ID or file path to access metadata and content details.Apache 2.0
- Export document content as plain markdown text for external use, sharing, or processing in other applications without additional formatting.MIT
- Relocate a document to a new collection or parent while preserving its nested hierarchy. Use to reorganize content structure, change collections, or update document nesting within the MCP Outline Server.MIT
- Retrieve court documents from RECAP Archive using CourtListener IDs to access filing metadata, text extraction, and PDF download URLs for legal research and analysis.MIT
- Remove a medical document from OncoFiles by moving it to trash for 30 days, where it remains recoverable but hidden from searches and listings.MIT
- Initialize a .brand/ directory with empty configuration scaffold to set up brand identity structure without running extraction.MIT
- Retrieve document properties from IBM Content Manager using document ID or file path to access metadata and details stored in the repository.Apache 2.0
- Monitor the progress of datasheet extraction for electronic components. Track status updates including ready, extracting, pending, or failed states with current step and elapsed time.MIT
- Modify or replace the title and content of an existing document. Use this tool to edit, append, or correct information in documents. Requires the full document content for updates with changes.MIT
- Identify all documents linking to a specific document within the workspace. Use to uncover references, dependencies, and relationships between documents via the MCP Outline Server.MIT
- Retrieve and examine the content of a single markdown document from the documentation directory, including frontmatter metadata.MIT
- Retrieves structured metadata for Dutch parliamentary documents, including title, type, document number, dates, version, and links to PDF and official web pages, using a unique identifier.MIT
- Create a new Grist document in a workspace, optionally forking an existing document to copy its structure and data. Returns the new document ID and URL.
- Retrieve complete Amazon Business API documentation content using document references from search results. Access full API references, implementation guides, and detailed endpoint information for integration development.Apache 2.0
- Generate embedded signing invites for documents or document groups to collect signatures directly within applications.MIT
- Generate embedded sending links to manage, edit, or send invites for documents and document groups in SignNow.MIT
- Generate an embedded editor to modify documents or document groups within SignNow's e-signature platform. Specify the document ID and optional parameters for customization.MIT
- Retrieve comprehensive details for a research paper using its arXiv ID, including title, authors, summary, novelty score, and structured extraction data for analysis.MIT
- Perform comprehensive web searches to extract and consolidate full content from top results using advanced content extraction for thorough research.MIT
- Remove a document from a Pega case by unlinking it permanently using the document ID and case ID.Apache 2.0
- Convert web page content to structured Markdown, preserving tables and definition lists. Ideal for extracting clean, readable text from HTML while maintaining document integrity.MIT
- Generate a structured feature tree representation of a FreeCAD document showing object types, properties, dependencies, and validity states for model analysis and debugging.
- Create a new document collection to organize related documents and enable semantic search across their contents. Returns a collection ID for managing documents.MIT
- Save structured data extraction results to a JSONL file for storage, visualization, or further processing.MIT
- Write or replace full text content in a shared document by specifying document ID and share ID. Use for collaborative document editing with real-time sync.MIT
- Convert Microsoft Office documents (Word, Excel, PowerPoint) to Markdown format with image extraction and optimization for Claude to read and process.MIT
- Send signing invitations for documents or document groups to recipients with defined roles and actions.MIT
- Retrieve a paginated list of document groups with basic details from SignNow for e-signature workflows.MIT
- Retrieve details about a generated document including status, download URL, file size, page count, and creation date using its document ID.MIT
- Convert markdown content into a Google Document using Google's native markdown parser. Returns document ID and web link for the newly created document.
- Extract specific data from JSON files using paths, filters, patterns, or slices. Retrieve particular values, filter arrays/objects by conditions, search for patterns, or slice data for targeted extraction and analysis.
- Extract content from Google Docs in text, JSON, or Markdown formats using document ID for integration and analysis.
- Generate a document from a template and create an embedded signing invite for recipients to view, sign, or approve the document.MIT
- Remove a specific version of a document from the content repository using its ID or path to manage document history and storage.Apache 2.0
- Extract and scrape webpage content using auto, simple, Scrapy, or Selenium methods. Define extraction rules or wait for specific elements to retrieve targeted data.MIT
- Search for documents in IBM FileNet Content Manager using keywords or document names to retrieve matching files for document management tasks.Apache 2.0
- Convert PDF files to AI-readable Markdown format with automatic detection of PDF type and best extraction method, providing confidence scores and warnings for limited extraction.MIT