Fetcher MCP
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Integrations
Executes JavaScript on web pages for proper rendering of dynamic content, allowing retrieval of content from modern web applications that rely on client-side JavaScript.
Supports configuration and installation on macOS systems with specific paths for the Claude Desktop configuration file.
Converts fetched web content to Markdown format, providing clean, structured text representation of web pages for easier consumption and processing.
Fetcher MCP
MCP server for fetch web page content using Playwright headless browser.
Advantages
- JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
- Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
- Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
- Parallel Processing: The
fetch_urls
tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations. - Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
- Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
- Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.
Quick Start
Run directly with npx:
Debug Mode
Run with the --debug
option to show the browser window for debugging:
Configuration MCP
Configure this MCP server in Claude Desktop:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
Features
fetch_url
- Retrieve web page content from a specified URL- Uses Playwright headless browser to parse JavaScript
- Supports intelligent extraction of main content and conversion to Markdown
- Supports the following parameters:
url
: The URL of the web page to fetch (required parameter)timeout
: Page loading timeout in milliseconds, default is 30000 (30 seconds)waitUntil
: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'extractContent
: Whether to intelligently extract the main content, default is truemaxLength
: Maximum length of returned content (in characters), default is no limitreturnHtml
: Whether to return HTML content instead of Markdown, default is falsewaitForNavigation
: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is falsenavigationTimeout
: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)disableMedia
: Whether to disable media resources (images, stylesheets, fonts, media), default is truedebug
: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
fetch_urls
- Batch retrieve web page content from multiple URLs in parallel- Uses multi-tab parallel fetching for improved performance
- Returns combined results with clear separation between webpages
- Supports the following parameters:
urls
: Array of URLs to fetch (required parameter)- Other parameters are the same as
fetch_url
Tips
Handling Special Website Scenarios
Dealing with Anti-Crawler Mechanisms
- Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:This will use theCopy
waitForNavigation: true
parameter. - Increase Timeout Duration: For websites that load slowly:This adjusts bothCopy
timeout
andnavigationTimeout
parameters accordingly.
Content Retrieval Adjustments
- Preserve Original HTML Structure: When content extraction might fail:SetsCopy
extractContent: false
andreturnHtml: true
. - Fetch Complete Page Content: When extracted content is too limited:SetsCopy
extractContent: false
. - Return Content as HTML: When HTML format is needed instead of default Markdown:SetsCopy
returnHtml: true
.
Debugging and Authentication
Enabling Debug Mode
- Dynamic Debug Activation: To display the browser window during a specific fetch operation:This setsCopy
debug: true
even if the server was started without the--debug
flag.
Using Custom Cookies for Authentication
- Manual Login: To login using your own credentials:SetsCopy
debug: true
or uses the--debug
flag, keeping the browser window open for manual login. - Interacting with Debug Browser: When debug mode is enabled:
- The browser window remains open
- You can manually log into the website using your credentials
- After login is complete, content will be fetched with your authenticated session
- Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:SetsCopy
debug: true
for this specific request only, opening the browser window for manual login.
Development
Install Dependencies
Install Playwright Browser
Install the browsers needed for Playwright:
Build the Server
Debugging
Use MCP Inspector for debugging:
You can also enable visible browser mode for debugging:
License
Licensed under the MIT License
This server cannot be installed
A specialized MCP server that retrieves web page content using Playwright headless browser, with capabilities for intelligent content extraction and batch URL processing.