The Web-curl MCP Server is a versatile tool for fetching web content, interacting with APIs, and performing searches, usable as both a CLI and MCP server.
• Fetch Webpage Content ( Retrieve text, HTML, and main article content from web pages with support for multi-page crawling, resource blocking, custom headers, basic authentication, and configurable timeouts • Make REST API Requests ( Execute HTTP requests (GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS) to any API endpoint with custom headers, request bodies, and timeouts • Google Custom Search ( Search the web using Google's API with advanced filters for language, region, site, date restrictions, and configurable result parameters (requires API key and CX ID) • Smart Commands ( Process free-form user instructions with automatic detection—fetching content from links or performing web searches with query enrichment and language detection
Inspired by curl, provides web request functionality for fetching content from websites and making REST API requests with custom methods, headers, and bodies.
Provides Google Custom Search functionality, allowing web searches with configurable result limits through Google's search API, requiring an API key and Custom Search Engine ID.
Leverages Puppeteer for robust web scraping capabilities, including resource blocking, content extraction, and headless browsing of websites.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Web-curl MCP Serverfetch the latest tech news from Hacker News homepage"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Google Custom Search API
Google Custom Search API is free with usage limits (e.g., 100 queries per day for free, with additional queries requiring payment). For full details on quotas, pricing, and restrictions, see the official documentation.
Web-curl

Developed by Rayss
🚀 Open Source Project
🛠️ Built with Node.js & TypeScript (Node.js v18+ required)
🎬 Demo Video
Click here to watch the demo video directly in your browser.
If your platform supports it, you can also download and play demo/demo_1.mp4 directly.
Related MCP server: Fetch MCP Server
📚 Table of Contents
📝 Changelog / Update History
See CHANGELOG.md for a complete history of updates and new features.
📝 Overview
Web-curl is a powerful tool for fetching and extracting text content from web pages and APIs. Use it as a standalone CLI or as an MCP (Model Context Protocol) server. Web-curl leverages Puppeteer for robust web scraping and supports advanced features such as resource blocking, custom headers, authentication, and Google Custom Search.
✨ Features
🚀 Deep Research & Automation (v1.4.2)
Advanced Browser Automation: Full control over Chromium via Puppeteer (click, type, scroll, hover, key presses).
Always-On Session Persistence: Browser profiles are now always persistent. Login sessions, cookies, and cache are automatically saved in a local
user_data/directory.Multi-Tab Research: Manage up to 10 concurrent tabs with automatic rotation. Open multiple pages or perform parallel searches to gather information faster.
Token-Efficient Snapshots:
Accessibility Tree: Clean, structured snapshots instead of messy HTML.
HTML Slice Mode: Raw HTML with
startIndex/endIndexfor safe chunking when needed.Viewport Filtering: Automatically filters out elements not visible on screen, saving up to 90% of context tokens on long pages.
Chrome DevTools Integration:
Network Monitoring: Capture XHR/Fetch requests to see data flowing behind the scenes.
Console Logs: Access browser console output for debugging or data extraction.
Browser Configuration: Set custom User-Agents, Proxies, and Viewport sizes.
Parallel Batch Operations:
multi_search: Run multiple Google searches at once.batch_navigate: Open and load multiple websites in parallel.
Intelligent Resource Management:
Idle Auto-Close: Browser automatically shuts down after 15 minutes of inactivity to save RAM/CPU.
Tab Rotation: Automatically replaces the oldest tab when the 10-tab limit is reached.
Media & Documents:
Full-Page Screenshots: Capture high-quality screenshots with a 5-day auto-cleanup lifecycle and custom destination support.
Document Parsing: Extract text from PDF and DOCX files directly from URLs.
Storage & Download Details
🗂️ Error log rotation:
logs/error-log.txtis rotated when it exceeds ~1MB (renamed toerror-log.txt.bak) to prevent unbounded growth.🧹 Logs & temp cleanup: old temporary files in the
logs/directory are cleaned up at startup.🛑 Browser lifecycle: Puppeteer browser instances are closed in finally blocks to avoid Chromium temp file leaks.
🔎 Content extraction:
Returns raw text, HTML, and Readability "main article" when available. Readability attempts to extract the primary content of a webpage, removing headers, footers, sidebars, and other non-essential elements, providing a cleaner, more focused text.
Readability output is subject to
startIndex/maxLength/chunkSizeslicing when requested.
🚫 Resource blocking:
blockResourcesis now always forced tofalse, meaning resources are never blocked for faster page loads.⏱️ Timeout control: navigation and API request timeouts are configurable via tool arguments.
💾 Output: results can be printed to stdout or written to a file via CLI options.
⬇️ Download behavior (
download_file):destinationFolderaccepts relative paths (resolved against the project root) or absolute paths.The server creates
destinationFolderif it does not exist.Downloads are streamed using Node streams +
pipelineto minimize memory use and ensure robust writes.Filenames are derived from the URL path (e.g.,
https://.../path/file.jpg->file.jpg). If no filename is present, the fallback name isdownloaded_file.Overwrite semantics: by default the implementation will overwrite an existing file with the same name.
🖥️ Usage modes: CLI and MCP server (stdin/stdout transport).
🌐 REST client:
fetch_apireturns JSON/text when appropriate and base64 for binary responses.🔍 Google Custom Search: requires
APIKEY_GOOGLE_SEARCHandCX_GOOGLE_SEARCH.🤖 Smart command:
Auto language detection (franc-min) and optional translation (dynamic
translateimport).Query enrichment is heuristic-based; results depend on the detected intent.
🏗️ Architecture
This section outlines the high-level architecture of Web-curl.
CLI & MCP Server:
src/index.tsImplements both the CLI entry point and the MCP server.Web Scraping: Uses Puppeteer for headless browsing and content extraction.
REST Client:
src/rest-client.tsProvides a flexible HTTP client for API requests.
⚙️ MCP Server Configuration Example
To integrate web-curl as an MCP server, add the following configuration to your mcp_settings.json:
🔑 How to Obtain Google API Key and CX
Get a Google API Key:
Go to Google Cloud Console.
Create/select a project, then go to APIs & Services > Credentials.
Click Create Credentials > API key and copy it.
Get a Custom Search Engine (CX) ID:
Go to Google Custom Search Engine.
Create/select a search engine, then copy the Search engine ID (CX).
Enable Custom Search API:
In Google Cloud Console, go to APIs & Services > Library.
Search for Custom Search API and enable it.
Replace YOUR_GOOGLE_API_KEY and YOUR_CX_ID in the config above.
🛠️ Installation
Prerequisites: Ensure you have Node.js (v18+) and Git installed on your system.
Puppeteer installation notes
Windows: Just run
npm install.Linux / Ubuntu Server: You must install extra dependencies for Chromium to handle rendering and screenshots in a headless environment. Run:
sudo apt-get update && sudo apt-get install -y \ fonts-liberation \ libasound2 \ libatk-bridge2.0-0 \ libatk1.0-0 \ libc6 \ libcairo2 \ libcups2 \ libdbus-1-3 \ libexpat1 \ libfontconfig1 \ libgbm1 \ libgcc1 \ libglib2.0-0 \ libgtk-3-0 \ libnspr4 \ libnss3 \ libpango-1-0-0 \ libpangocairo-1.0-0 \ libstdc++6 \ libx11-6 \ libx11-xcb1 \ libxcb1 \ libxcomposite1 \ libxcursor1 \ libxdamage1 \ libxext6 \ libxfixes3 \ libxi6 \ libxrandr2 \ libxrender1 \ libxss1 \ libxtst6 \ lsb-release \ wget \ xdg-utils
For more details, see the Puppeteer troubleshooting guide.
🚀 Usage
CLI Usage
The CLI supports fetching and extracting text content from web pages.
Command Line Options
--timeout <ms>: Set navigation timeout (default: 60000)-o <file>: Output result to specified file
MCP Server Usage
Web-curl can be run as an MCP server for integration with Roo Context or other MCP-compatible environments.
Exposed Tools (v1.4.2)
browser_flow: One-call workflow (optional navigate → optional actions → return snapshot/screenshot/links/console/network). Use this to avoid calling many tools.
browser_navigate: Open a URL in the active tab (includes network-idle wait + short stabilization).
browser_snapshot: TEXT snapshot (tree by default, or
mode: "html"slices withstartIndex/endIndex).browser_action: Interact with the page (click/type/scroll/hover/press_key/waitForSelector). Best used with
ref:from snapshot.browser_tabs: List, create, close, or select tabs (max 10).
batch_navigate: Open many URLs (each in a new tab) and return tab indexes.
multi_search: Run multiple Google searches in parallel.
browser_network_requests: Recent network requests.
browser_console_messages: Recent console logs/warnings/errors.
browser_configure: Set proxy/user-agent/viewport (session persistence is always on via
user_data/).browser_links: Extract all valid links from the page.
take_screenshot: PNG screenshot to disk. Default
fullPage: true(setfalsefor faster viewport-only).parse_document: Extract text from PDF/DOCX URLs.
browser_close: Close browser and tabs.
google_search: Google Custom Search (single query).
fetch_api: REST API request with response truncation (
limit).smart_command: Natural-language search command (auto language detect + translate + query enrichment).
download_file: Download a file from a URL.
Running as MCP Server
The server will communicate via stdin/stdout and expose the tools as defined in src/index.ts.
🚦 HTML Slicing Example (Recommended for Large Pages)
Use browser_snapshot with mode: "html" when you need raw HTML but want to keep the response small.
Client request for first slice:
Response (example):
🧩 Configuration
Session Persistence: Always enabled. Logins and cookies are automatically reused across restarts.
Timeout: Set navigation and API request timeouts.
Environment Variables: Used for Google Search API integration.
💡 Examples {#examples}
Note: destinationFolder can be either a relative path (resolved against the project root) or an absolute path. The server will create the destination folder if it does not exist.
Note: Session persistence is always enabled. Cookies and login sessions are automatically stored in the user_data/ directory.
🛠️ Troubleshooting {#troubleshooting}
Timeout Errors: Increase the
timeoutparameter if requests are timing out.Google Search Fails: Ensure
APIKEY_GOOGLE_SEARCHandCX_GOOGLE_SEARCHare set in your environment.Error Logs: Check the
logs/error-log.txtfile for detailed error messages.
🧠 Tips & Best Practices {#tips--best-practices}
For large pages, use
maxLengthandstartIndexto fetch content in slices.Always validate your tool arguments to avoid errors.
Secure your API keys and sensitive data using environment variables.
Review the MCP tool schemas in
src/index.tsfor all available options.
🤝 Contributing & Issues {#contributing--issues}
Contributions are welcome! If you want to contribute, fork this repository and submit a pull request.
If you find any issues or have suggestions, please open an issue on the repository page.
📄 License & Attribution {#license--attribution}
This project was developed by Rayss.
For questions, improvements, or contributions, please contact the author or open an issue in the repository.
Note: Google Search API is free with usage limits. For details, see: Google Custom Search API Overview