The MCP Server Fetch Python is a web content extraction and transformation server that provides tools for fetching, rendering, and converting web content into various formats:
- Extract raw text: Directly fetch raw text from URLs, ideal for structured data formats like JSON, XML, CSV, TSV, or plain text
- Fetch rendered HTML: Retrieve fully rendered HTML content including JavaScript-generated elements, essential for modern web applications and SPAs
- Convert to Markdown: Transform web page content into clean, well-formatted Markdown while preserving structural elements
- Extract content from media: Use AI-powered tools to analyze images and videos, converting visual content into Markdown format using computer vision and OCR (requires an OpenAI API key)
Enables fetching content from JavaScript-rendered pages through a headless browser, making it possible to extract content from modern web applications and SPAs
Converts web page content to well-formatted Markdown while preserving structural elements like tables and definition lists
Leverages OpenAI's vision capabilities for AI-powered content extraction from media files (images and videos) when provided with an API key
Supports extraction of raw text content from XML files through the get-raw-text tool for direct access to structured data
mcp-server-fetch-python
An MCP server for fetching and transforming web content into various formats. This server provides comprehensive tools for extracting content from web pages, including support for JavaScript-rendered content and media files.
Features
Tools
The server provides four specialized tools:
- get-raw-text: Extracts raw text content directly from URLs without browser rendering
- Arguments:
url
: URL of the target web page (text, JSON, XML, csv, tsv, etc.) (required)
- Best used for structured data formats or when fast, direct access is needed
- Arguments:
- get-rendered-html: Fetches fully rendered HTML content using a headless browser
- Arguments:
url
: URL of the target web page (required)
- Essential for modern web applications and SPAs that require JavaScript rendering
- Arguments:
- get-markdown: Converts web page content to well-formatted Markdown
- Arguments:
url
: URL of the target web page (required)
- Preserves structural elements while providing clean, readable text output
- Arguments:
- get-markdown-from-media: Performs AI-powered content extraction from media files
- Arguments:
url
: URL of the target media file (images, videos) (required)
- Utilizes computer vision and OCR for visual content analysis
- Requires a valid OPENAI_API_KEY to be set in environment variables
- Will return an error message if the API key is not set or if there are issues processing the media file
- Arguments:
Usage
Claude Desktop
To use with Claude Desktop, add the server configuration:
On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
Environment Variables
The following environment variables can be configured:
- OPENAI_API_KEY: Required for using the
get-markdown-from-media
tool. This key is needed for AI-powered image analysis and content extraction. - PYTHONIOENCODING: Set to "utf-8" if you encounter character encoding issues in the output.
- MODEL_NAME: Specifies the model name to use. Defaults to "gpt-4o".
Local Installation
Alternatively, you can install and run the server locally:
Then add the following configuration to Claude Desktop config file:
Development
Debugging
You can start the MCP Inspector using npxwith the following commands:
You must be authenticated.
An MCP server for fetching and transforming web content into various formats.
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityA powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.Last updated -414612TypeScriptMIT License
- AsecurityAlicenseAqualityA simple MCP server that facilitates website fetching through a configurable server platform using stdio or SSE transport, allowing integration with tools like Cursor for streamlined access.Last updated -223PythonMIT License
- -securityAlicense-qualityAn MCP server that enables fetching web content using the Node.js undici library, supporting various HTTP methods, content formats, and request configurations.Last updated -668TypeScriptMIT License
- AsecurityAlicenseAqualityAn MCP server that retrieves web page content using Playwright headless browser, capable of extracting main content and converting to Markdown format.Last updated -2701680TypeScriptMIT License