Skip to main content
Glama

MCP Web Research Agent

MCP Web Research Agent

A powerful MCP (Model Context Protocol) tool for automated web research, scraping, and intelligence gathering.

License: MIT Python 3.8+ MCP Protocol

A sophisticated web research automation tool that converts your existing scraper into an MCP-compatible agent for enhanced AI workflows. Perfect for competitive intelligence, market research, and automated data collection.

๐Ÿš€ Features

  • ๐Ÿ” Intelligent Scraping: Recursive web crawling with configurable depth

  • ๐Ÿ”Ž Search Integration: Multi-engine search with result processing

  • ๐Ÿ’พ Database Storage: Persistent SQLite storage with advanced querying

  • ๐Ÿ“Š Multiple Export Formats: JSON, Markdown, and CSV exports

  • ๐Ÿค– MCP Integration: Seamless integration with AI assistants

  • โšก Async Ready: Built for concurrent operations

  • ๐Ÿ”ง Configurable: Adjustable settings for any use case

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8+

  • MCP-compatible client (Claude Desktop, etc.)

Quick Install

# Clone the repository git clone https://github.com/yourusername/mcp-web-research-agent.git cd mcp-web-research-agent # Install dependencies pip install -e .

MCP Client Configuration

Add to your MCP client configuration:

{ "mcpServers": { "web-research-agent": { "command": "python", "args": ["/path/to/mcp-web-research-agent/server.py"] } } }

๐Ÿ“– Usage

Available Tools

scrape_url

Scrape a single URL for specific keywords

result = await scrape_url( url="https://example.com", keywords=["python", "automation", "scraping"], extract_links=False, max_depth=1 )

search_and_scrape

Search the web and automatically scrape results

result = await search_and_scrape( query="web scraping best practices", keywords=["python", "beautifulsoup", "requests"], search_engine_url="https://searx.gophernuttz.us/search/", max_results=10 )

get_scraping_results

Query the database for previous scraping results

result = await get_scraping_results( keyword_filter="python", limit=50 )

export_results

Export results to various formats

result = await export_results( format="markdown", keyword_filter="python", output_path="/path/to/output.md" )

get_scraping_stats

Get current statistics and status

result = await get_scraping_stats()

๐Ÿ—ƒ๏ธ Database Schema

The agent uses SQLite with the following structure:

-- URLs table CREATE TABLE urls ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT UNIQUE NOT NULL, title TEXT, content TEXT, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); -- Keywords table CREATE TABLE keywords ( id INTEGER PRIMARY KEY AUTOINCREMENT, keyword TEXT UNIQUE NOT NULL ); -- URL-Keyword relationships CREATE TABLE url_keywords ( id INTEGER PRIMARY KEY AUTOINCREMENT, url_id INTEGER, keyword_id INTEGER, matches INTEGER DEFAULT 1, context TEXT, FOREIGN KEY (url_id) REFERENCES urls (id), FOREIGN KEY (keyword_id) REFERENCES keywords (id), UNIQUE(url_id, keyword_id) );

๐Ÿ”ง Configuration

Default Settings

  • Max Depth: 3 levels of recursive crawling

  • Request Delay: 1 second between requests

  • User Agent: Modern Chrome browser simulation

  • Database: scraper_results.db (auto-created)

Customization

Modify settings in the MCPWebScraper constructor:

scraper = MCPWebScraper( db_manager=db_manager, max_depth=5, # Increase crawl depth delay=0.5 # Faster requests )

๐Ÿงช Development

Running Tests

python test_mcp_scraper.py

Example Usage

python example_usage.py

Project Structure

mcp-web-research-agent/ โ”œโ”€โ”€ server.py # MCP server implementation โ”œโ”€โ”€ scraper.py # Core scraping logic โ”œโ”€โ”€ database.py # Database management โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ pyproject.toml # Package configuration โ”œโ”€โ”€ test_mcp_scraper.py # Unit tests โ”œโ”€โ”€ example_usage.py # Usage examples โ””โ”€โ”€ README.md # This file

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository

  2. Create your feature branch (git checkout -b feature/amazing-feature)

  3. Commit your changes (git commit -m 'Add some amazing feature')

  4. Push to the branch (git push origin feature/amazing-feature)

  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built on the Model Context Protocol

  • Inspired by modern web scraping best practices

  • Thanks to the open-source community for amazing tools


Built with โค๏ธ for the MCP ecosystem

-
security - not tested
A
license - permissive license
-
quality - not tested

local-only server

The server can only run on the client's local machine because it depends on local resources.

Enables automated web research and intelligence gathering through recursive web crawling, multi-engine search integration, and persistent SQLite storage with support for keyword filtering and multiple export formats.

  1. ๐Ÿš€ Features
    1. ๐Ÿ› ๏ธ Installation
      1. Prerequisites
      2. Quick Install
      3. MCP Client Configuration
    2. ๐Ÿ“– Usage
      1. Available Tools
    3. ๐Ÿ—ƒ๏ธ Database Schema
      1. ๐Ÿ”ง Configuration
        1. Default Settings
        2. Customization
      2. ๐Ÿงช Development
        1. Running Tests
        2. Example Usage
        3. Project Structure
      3. ๐Ÿค Contributing
        1. ๐Ÿ“„ License
          1. ๐Ÿ™ Acknowledgments

            MCP directory API

            We provide all the information about MCP servers via our MCP API.

            curl -X GET 'https://glama.ai/api/mcp/v1/servers/SnotacusNexus/mcp-web-research-agent'

            If you have feedback or need assistance with the MCP directory API, please join our Discord server