Skip to main content
Glama

Fetch MCP

by jae-jae

中文 | Deutsch | Español | français | 日本語 | 한국어 | Português | Русский

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

🌟 Recommended: OllaMan - Powerful Ollama AI Model Manager.

Advantages

  • JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
  • Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
  • Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
  • Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
  • Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
  • Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
  • Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser by running the following command in your terminal:

npx playwright install chromium

HTTP and SSE Transport

Use the --transport=http parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:

npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000

After startup, the server provides the following endpoints:

  • /mcp - Streamable HTTP endpoint (modern MCP protocol)
  • /sse - SSE endpoint (legacy MCP protocol)

Clients can choose which method to connect based on their needs.

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{ "mcpServers": { "fetcher": { "command": "npx", "args": ["-y", "fetcher-mcp"] } } }

Docker Deployment

Running with Docker

docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest

Deploying with Docker Compose

Create a docker-compose.yml file:

version: "3.8" services: fetcher-mcp: image: ghcr.io/jae-jae/fetcher-mcp:latest container_name: fetcher-mcp restart: unless-stopped ports: - "3000:3000" environment: - NODE_ENV=production # Using host network mode on Linux hosts can improve browser access efficiency # network_mode: "host" volumes: # For Playwright, may need to share certain system paths - /tmp:/tmp # Health check healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"] interval: 30s timeout: 10s retries: 3

Then run:

docker-compose up -d

Features

  • fetch_url - Retrieve web page content from a specified URL
    • Uses Playwright headless browser to parse JavaScript
    • Supports intelligent extraction of main content and conversion to Markdown
    • Supports the following parameters:
      • url: The URL of the web page to fetch (required parameter)
      • timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)
      • waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
      • extractContent: Whether to intelligently extract the main content, default is true
      • maxLength: Maximum length of returned content (in characters), default is no limit
      • returnHtml: Whether to return HTML content instead of Markdown, default is false
      • waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
      • navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
      • disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true
      • debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
  • fetch_urls - Batch retrieve web page content from multiple URLs in parallel
    • Uses multi-tab parallel fetching for improved performance
    • Returns combined results with clear separation between webpages
    • Supports the following parameters:
      • urls: Array of URLs to fetch (required parameter)
      • Other parameters are the same as fetch_url

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms
  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:
    Please wait for the page to fully load
    This will use the waitForNavigation: true parameter.
  • Increase Timeout Duration: For websites that load slowly:
    Please set the page loading timeout to 60 seconds
    This adjusts both timeout and navigationTimeout parameters accordingly.
Content Retrieval Adjustments
  • Preserve Original HTML Structure: When content extraction might fail:
    Please preserve the original HTML content
    Sets extractContent: false and returnHtml: true.
  • Fetch Complete Page Content: When extracted content is too limited:
    Please fetch the complete webpage content instead of just the main content
    Sets extractContent: false.
  • Return Content as HTML: When HTML format is needed instead of default Markdown:
    Please return the content in HTML format
    Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode
  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:
    Please enable debug mode for this fetch operation
    This sets debug: true even if the server was started without the --debug flag.
Using Custom Cookies for Authentication
  • Manual Login: To login using your own credentials:
    Please run in debug mode so I can manually log in to the website
    Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.
  • Interacting with Debug Browser: When debug mode is enabled:
    1. The browser window remains open
    2. You can manually log into the website using your credentials
    3. After login is complete, content will be fetched with your authenticated session
  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:
    Please enable debug mode for this authentication step
    Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug
  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

local-only server

The server can only run on the client's local machine because it depends on local resources.

Playwright 헤드리스 브라우저를 사용하여 웹 페이지 콘텐츠를 검색하고, 주요 콘텐츠를 추출하여 Markdown 형식으로 변환할 수 있는 MCP 서버입니다.

  1. 장점
    1. 빠른 시작
      1. 디버그 모드
    2. 구성 MCP
      1. 특징
          1. 특수 웹사이트 시나리오 처리
          2. 디버깅 및 인증
        1. 개발
          1. 종속성 설치
          2. Playwright Browser 설치
          3. 서버 구축
        2. 디버깅
          1. 관련 프로젝트
            1. 특허

              Related MCP Servers

              • A
                security
                A
                license
                A
                quality
                A powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.
                Last updated 8 months ago
                4
                2,449
                36
                TypeScript
                MIT License
                • Apple
                • Linux
              • A
                security
                A
                license
                A
                quality
                An MCP server for fetching and transforming web content into various formats.
                Last updated 4 months ago
                4
                6
                Python
                MIT License
                • Apple
              • A
                security
                A
                license
                A
                quality
                A MCP server that provides browser automation tools, allowing users to navigate websites, take screenshots, click elements, fill forms, and execute JavaScript through Playwright.
                Last updated 5 months ago
                8
                Python
                Apache 2.0
                • Apple
              • A
                security
                F
                license
                A
                quality
                An MCP server that extracts meaningful content from websites and converts HTML to high-quality Markdown, using Mozilla's Readability engine.
                Last updated 5 months ago
                1
                5,577
                6
                JavaScript

              View all related MCP servers

              MCP directory API

              We provide all the information about MCP servers via our MCP API.

              curl -X GET 'https://glama.ai/api/mcp/v1/servers/jae-jae/fetcher-mcp'

              If you have feedback or need assistance with the MCP directory API, please join our Discord server