中文 | Deutsch | Español | français | 日本語 | 한국어 | Português | Русский
Fetcher MCP
MCP server for fetch web page content using Playwright headless browser.
🌟 Recommended: OllaMan - Powerful Ollama AI Model Manager.
Advantages
- JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
- Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
- Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
- Parallel Processing: The
fetch_urls
tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations. - Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
- Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
- Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.
Quick Start
Run directly with npx:
First time setup - install the required browser by running the following command in your terminal:
HTTP and SSE Transport
Use the --transport=http
parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:
After startup, the server provides the following endpoints:
/mcp
- Streamable HTTP endpoint (modern MCP protocol)/sse
- SSE endpoint (legacy MCP protocol)
Clients can choose which method to connect based on their needs.
Debug Mode
Run with the --debug
option to show the browser window for debugging:
Configuration MCP
Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
Docker Deployment
Running with Docker
Deploying with Docker Compose
Create a docker-compose.yml
file:
Then run:
Features
fetch_url
- Retrieve web page content from a specified URL- Uses Playwright headless browser to parse JavaScript
- Supports intelligent extraction of main content and conversion to Markdown
- Supports the following parameters:
url
: The URL of the web page to fetch (required parameter)timeout
: Page loading timeout in milliseconds, default is 30000 (30 seconds)waitUntil
: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'extractContent
: Whether to intelligently extract the main content, default is truemaxLength
: Maximum length of returned content (in characters), default is no limitreturnHtml
: Whether to return HTML content instead of Markdown, default is falsewaitForNavigation
: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is falsenavigationTimeout
: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)disableMedia
: Whether to disable media resources (images, stylesheets, fonts, media), default is truedebug
: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
fetch_urls
- Batch retrieve web page content from multiple URLs in parallel- Uses multi-tab parallel fetching for improved performance
- Returns combined results with clear separation between webpages
- Supports the following parameters:
urls
: Array of URLs to fetch (required parameter)- Other parameters are the same as
fetch_url
Tips
Handling Special Website Scenarios
Dealing with Anti-Crawler Mechanisms
- Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:This will use the
waitForNavigation: true
parameter. - Increase Timeout Duration: For websites that load slowly:This adjusts both
timeout
andnavigationTimeout
parameters accordingly.
Content Retrieval Adjustments
- Preserve Original HTML Structure: When content extraction might fail:Sets
extractContent: false
andreturnHtml: true
. - Fetch Complete Page Content: When extracted content is too limited:Sets
extractContent: false
. - Return Content as HTML: When HTML format is needed instead of default Markdown:Sets
returnHtml: true
.
Debugging and Authentication
Enabling Debug Mode
- Dynamic Debug Activation: To display the browser window during a specific fetch operation:This sets
debug: true
even if the server was started without the--debug
flag.
Using Custom Cookies for Authentication
- Manual Login: To login using your own credentials:Sets
debug: true
or uses the--debug
flag, keeping the browser window open for manual login. - Interacting with Debug Browser: When debug mode is enabled:
- The browser window remains open
- You can manually log into the website using your credentials
- After login is complete, content will be fetched with your authenticated session
- Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:Sets
debug: true
for this specific request only, opening the browser window for manual login.
Development
Install Dependencies
Install Playwright Browser
Install the browsers needed for Playwright:
Build the Server
Debugging
Use MCP Inspector for debugging:
You can also enable visible browser mode for debugging:
Related Projects
- g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.
License
Licensed under the MIT License
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
Playwright 헤드리스 브라우저를 사용하여 웹 페이지 콘텐츠를 검색하고, 주요 콘텐츠를 추출하여 Markdown 형식으로 변환할 수 있는 MCP 서버입니다.
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityA powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.Last updated 8 months ago42,44936TypeScriptMIT License
- AsecurityAlicenseAqualityAn MCP server for fetching and transforming web content into various formats.Last updated 4 months ago46PythonMIT License
- AsecurityAlicenseAqualityA MCP server that provides browser automation tools, allowing users to navigate websites, take screenshots, click elements, fill forms, and execute JavaScript through Playwright.Last updated 5 months ago8PythonApache 2.0
- AsecurityFlicenseAqualityAn MCP server that extracts meaningful content from websites and converts HTML to high-quality Markdown, using Mozilla's Readability engine.Last updated 5 months ago15,5776JavaScript