Skip to main content
Glama

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

Advantages

  • JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.

  • Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.

  • Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.

  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.

  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.

  • Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.

  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser by running the following command in your terminal:

npx playwright install chromium

HTTP and SSE Transport

Use the --transport=http parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:

npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000

After startup, the server provides the following endpoints:

  • /mcp - Streamable HTTP endpoint (modern MCP protocol)

  • /sse - SSE endpoint (legacy MCP protocol)

Clients can choose which method to connect based on their needs.

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{ "mcpServers": { "fetcher": { "command": "npx", "args": ["-y", "fetcher-mcp"] } } }

Docker Deployment

Running with Docker

docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest

Deploying with Docker Compose

Create a docker-compose.yml file:

version: "3.8" services: fetcher-mcp: image: ghcr.io/jae-jae/fetcher-mcp:latest container_name: fetcher-mcp restart: unless-stopped ports: - "3000:3000" environment: - NODE_ENV=production # Using host network mode on Linux hosts can improve browser access efficiency # network_mode: "host" volumes: # For Playwright, may need to share certain system paths - /tmp:/tmp # Health check healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"] interval: 30s timeout: 10s retries: 3

Then run:

docker-compose up -d

Features

  • fetch_url - Retrieve web page content from a specified URL

    • Uses Playwright headless browser to parse JavaScript

    • Supports intelligent extraction of main content and conversion to Markdown

    • Supports the following parameters:

      • url: The URL of the web page to fetch (required parameter)

      • timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)

      • waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'

      • extractContent: Whether to intelligently extract the main content, default is true

      • maxLength: Maximum length of returned content (in characters), default is no limit

      • returnHtml: Whether to return HTML content instead of Markdown, default is false

      • waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false

      • navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)

      • disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true

      • debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified

  • fetch_urls - Batch retrieve web page content from multiple URLs in parallel

    • Uses multi-tab parallel fetching for improved performance

    • Returns combined results with clear separation between webpages

    • Supports the following parameters:

      • urls: Array of URLs to fetch (required parameter)

      • Other parameters are the same as fetch_url

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:

    Please wait for the page to fully load

    This will use the waitForNavigation: true parameter.

  • Increase Timeout Duration: For websites that load slowly:

    Please set the page loading timeout to 60 seconds

    This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: When content extraction might fail:

    Please preserve the original HTML content

    Sets extractContent: false and returnHtml: true.

  • Fetch Complete Page Content: When extracted content is too limited:

    Please fetch the complete webpage content instead of just the main content

    Sets extractContent: false.

  • Return Content as HTML: When HTML format is needed instead of default Markdown:

    Please return the content in HTML format

    Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:

    Please enable debug mode for this fetch operation

    This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

  • Manual Login: To login using your own credentials:

    Please run in debug mode so I can manually log in to the website

    Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.

  • Interacting with Debug Browser: When debug mode is enabled:

    1. The browser window remains open

    2. You can manually log into the website using your credentials

    3. After login is complete, content will be fetched with your authenticated session

  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:

    Please enable debug mode for this authentication step

    Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

Deploy Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Related MCP Servers

  • A
    security
    -
    license
    A
    quality
    A powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.
    Last updated -
    4
    1,017
    37
    MIT License
    • Apple
    • Linux
  • A
    security
    A
    license
    A
    quality
    An MCP server for fetching and transforming web content into various formats.
    Last updated -
    4
    7
    MIT License
    • Apple
  • A
    security
    A
    license
    A
    quality
    A MCP server that provides browser automation tools, allowing users to navigate websites, take screenshots, click elements, fill forms, and execute JavaScript through Playwright.
    Last updated -
    8
    1
    Apache 2.0
    • Apple
  • A
    security
    F
    license
    A
    quality
    An MCP server that extracts meaningful content from websites and converts HTML to high-quality Markdown, using Mozilla's Readability engine.
    Last updated -
    1
    10,298
    7

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jae-jae/fetcher-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server