Skip to main content
Glama

Web-curl MCP Server

by rayss868

Google Custom Search API

Google Custom Search API is free with usage limits (e.g., 100 queries per day for free, with additional queries requiring payment). For full details on quotas, pricing, and restrictions, see the official documentation.

Web-curl

Web-curl Logo

Developed by Rayss

🚀 Open Source Project
🛠️ Built with Node.js & TypeScript (Node.js v18+ required)


Status


🎬 Demo Video

Demo Video (MP4)


📚 Table of Contents


📝 Overview

Web-curl is a powerful tool for fetching and extracting text content from web pages and APIs. Use it as a standalone CLI or as an MCP (Model Context Protocol) server. Web-curl leverages Puppeteer for robust web scraping and supports advanced features such as resource blocking, custom headers, authentication, and Google Custom Search.


✨ Features

  • 🔎 Retrieve text content from any website.
  • 🚫 Block unnecessary resources (images, stylesheets, fonts) for faster loading.
  • ⏱️ Set navigation timeouts and content extraction limits.
  • 💾 Output results to stdout or save to a file.
  • 🖥️ Use as a CLI tool or as an MCP server.
  • 🌐 Make REST API requests with custom methods, headers, and bodies.
  • 🔍 Integrate Google Custom Search (requires API key and CX).
  • 🤖 Smart command parsing (auto-detects URLs and search queries).
  • 🛡️ Detailed error logging and robust error handling.

🏗️ Architecture

  • CLI & MCP Server: src/index.ts
    Implements both the CLI entry point and the MCP server, exposing tools like fetch_webpage, fetch_api, google_search, and smart_command.
  • Web Scraping: Uses Puppeteer for headless browsing, resource blocking, and content extraction.
  • REST Client: src/rest-client.ts
    Provides a flexible HTTP client for API requests, used by both CLI and MCP tools.
  • Configuration: Managed via CLI options, environment variables, and tool arguments.

⚙️ MCP Server Configuration Example

To integrate web-curl as an MCP server, add the following configuration to your mcp_settings.json:

{ "mcpServers": { "web-curl": { "command": "node", "args": [ "build/index.js" ], "disabled": false, "alwaysAllow": [ "fetch_webpage", "fetch_api", "google_search", "smart_command" ], "env": { "APIKEY_GOOGLE_SEARCH": "YOUR_GOOGLE_API_KEY", "CX_GOOGLE_SEARCH": "YOUR_CX_ID" } } } }

🔑 How to Obtain Google API Key and CX

  1. Get a Google API Key:
    • Go to Google Cloud Console.
    • Create/select a project, then go to APIs & Services > Credentials.
    • Click Create Credentials > API key and copy it.
  2. Get a Custom Search Engine (CX) ID:
  3. Enable Custom Search API:
    • In Google Cloud Console, go to APIs & Services > Library.
    • Search for Custom Search API and enable it.

Replace YOUR_GOOGLE_API_KEY and YOUR_CX_ID in the config above.


🛠️ Installation

# Clone the repository git clone https://github.com/rayss868/MCP-Web-Curl cd web-curl # Install dependencies npm install # Build the project npm run build

Puppeteer installation notes

  • Windows: Just run npm install.
  • Linux: You must install extra dependencies for Chromium. Run:
    sudo apt-get install -y \ ca-certificates fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 \ libatk1.0-0 libcups2 libdbus-1-3 libdrm2 libgbm1 libnspr4 libnss3 \ libx11-xcb1 libxcomposite1 libxdamage1 libxrandr2 xdg-utils
    For more details, see the Puppeteer troubleshooting guide.

🚀 Usage

CLI Usage

The CLI supports fetching and extracting text content from web pages.

# Basic usage node build/index.js https://example.com # With options node build/index.js --timeout 30000 --no-block-resources https://example.com # Save output to a file node build/index.js -o result.json https://example.com
Command Line Options
  • --timeout <ms>: Set navigation timeout (default: 60000)
  • --no-block-resources: Disable blocking of images, stylesheets, and fonts
  • -o <file>: Output result to specified file

MCP Server Usage

Web-curl can be run as an MCP server for integration with Roo Context or other MCP-compatible environments.

Exposed Tools
  • fetch_webpage: Retrieve text content from a web page
  • fetch_api: Make REST API requests
  • google_search: Search the web using Google Custom Search API
  • smart_command: Automatically parse and execute commands or search queries using the appropriate tool
Running as MCP Server
npm run start

The server will communicate via stdin/stdout and expose the tools as defined in src/index.ts.

MCP Tool Example (fetch_webpage)
{ "name": "fetch_webpage", "arguments": { "url": "https://example.com", "blockResources": true, "timeout": 60000, "maxLength": 10000 } }
Google Search Integration

Set the following environment variables for Google Custom Search:

  • APIKEY_GOOGLE_SEARCH: Your Google API key
  • CX_GOOGLE_SEARCH: Your Custom Search Engine ID

🧩 Configuration

  • Resource Blocking: Block images, stylesheets, and fonts for faster page loading.
  • Timeout: Set navigation and API request timeouts.
  • Custom Headers: Pass custom HTTP headers for advanced scenarios.
  • Authentication: Supports HTTP Basic Auth via username/password.
  • Environment Variables: Used for Google Search API integration.

💡 Examples

{ "name": "fetch_webpage", "arguments": { "url": "https://en.wikipedia.org/wiki/Web_scraping", "blockResources": true, "maxLength": 5000 } }
{ "name": "fetch_api", "arguments": { "url": "https://api.github.com/repos/nodejs/node", "method": "GET", "headers": { "Accept": "application/vnd.github.v3+json" } } }
{ "name": "google_search", "arguments": { "query": "web scraping best practices", "num": 5 } }

🛠️ Troubleshooting

  • Timeout Errors: Increase the timeout parameter if requests are timing out.
  • Blocked Content: If content is missing, try disabling resource blocking or adjusting resourceTypesToBlock.
  • Google Search Fails: Ensure APIKEY_GOOGLE_SEARCH and CX_GOOGLE_SEARCH are set in your environment.
  • Binary/Unknown Content: Non-text responses are base64-encoded.
  • Error Logs: Check the logs/error-log.txt file for detailed error messages.

🧠 Tips & Best Practices

  • Use resource blocking for faster and lighter scraping unless you need images or styles.
  • For large pages, use maxLength and startIndex to paginate content extraction.
  • Always validate your tool arguments to avoid errors.
  • Secure your API keys and sensitive data using environment variables.
  • Review the MCP tool schemas in src/index.ts for all available options.

🤝 Contributing & Issues

Contributions are welcome! If you want to contribute, fork this repository and submit a pull request.
If you find any issues or have suggestions, please open an issue on the repository page.


📄 License & Attribution

This project was developed by Rayss.
For questions, improvements, or contributions, please contact the author or open an issue in the repository.


Note: Google Search API is free with usage limits. For details, see: Google Custom Search API Overview

-
security - not tested
A
license - permissive license
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

A powerful tool for fetching and extracting text content from web pages and APIs, supporting web scraping, REST API requests, and Google Custom Search integration.

  1. Web-curl
    1. 🎬 Demo Video
    2. 📚 Table of Contents
    3. 📝 Overview
    4. ✨ Features
    5. 🏗️ Architecture
    6. ⚙️ MCP Server Configuration Example
    7. 🛠️ Installation
    8. 🚀 Usage
    9. 🧩 Configuration
    10. 💡 Examples
    11. 🛠️ Troubleshooting
    12. 🧠 Tips & Best Practices
    13. 🤝 Contributing & Issues
    14. 📄 License & Attribution

Related MCP Servers

  • A
    security
    A
    license
    A
    quality
    Enables web content scanning and analysis by fetching, analyzing, and extracting information from web pages using tools like page fetching, link extraction, site crawling, and more.
    Last updated -
    6
    9
    TypeScript
    MIT License
  • A
    security
    A
    license
    A
    quality
    Enables text extraction from web pages and PDFs, and execution of predefined commands, enhancing content processing and automation capabilities.
    Last updated -
    3
    TypeScript
    MIT License
  • -
    security
    -
    license
    -
    quality
    Provides functionality to fetch web content in various formats, including HTML, JSON, plain text, and Markdown with support for custom headers.
    Last updated -
    125,855
    TypeScript
  • -
    security
    F
    license
    -
    quality
    Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls.
    Last updated -
    125,855
    TypeScript

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/rayss868/MCP-Web-Curl'

If you have feedback or need assistance with the MCP directory API, please join our Discord server