Skip to main content
Glama
SnotacusNexus

MCP Web Research Agent

MCP Web Research Agent

A powerful MCP (Model Context Protocol) tool for automated web research, scraping, and intelligence gathering.

License: MIT Python 3.8+ MCP Protocol

A sophisticated web research automation tool that converts your existing scraper into an MCP-compatible agent for enhanced AI workflows. Perfect for competitive intelligence, market research, and automated data collection.

๐Ÿš€ Features

  • ๐Ÿ” Intelligent Scraping: Recursive web crawling with configurable depth

  • ๐Ÿ”Ž Search Integration: Multi-engine search with result processing

  • ๐Ÿ’พ Database Storage: Persistent SQLite storage with advanced querying

  • ๐Ÿ“Š Multiple Export Formats: JSON, Markdown, and CSV exports

  • ๐Ÿค– MCP Integration: Seamless integration with AI assistants

  • โšก Async Ready: Built for concurrent operations

  • ๐Ÿ”ง Configurable: Adjustable settings for any use case

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8+

  • MCP-compatible client (Claude Desktop, etc.)

Quick Install

# Clone the repository
git clone https://github.com/yourusername/mcp-web-research-agent.git
cd mcp-web-research-agent

# Install dependencies
pip install -e .

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "web-research-agent": {
      "command": "python",
      "args": ["/path/to/mcp-web-research-agent/server.py"]
    }
  }
}

๐Ÿ“– Usage

Available Tools

scrape_url

Scrape a single URL for specific keywords

result = await scrape_url(
    url="https://example.com",
    keywords=["python", "automation", "scraping"],
    extract_links=False,
    max_depth=1
)

search_and_scrape

Search the web and automatically scrape results

result = await search_and_scrape(
    query="web scraping best practices",
    keywords=["python", "beautifulsoup", "requests"],
    search_engine_url="https://searx.gophernuttz.us/search/",
    max_results=10
)

get_scraping_results

Query the database for previous scraping results

result = await get_scraping_results(
    keyword_filter="python",
    limit=50
)

export_results

Export results to various formats

result = await export_results(
    format="markdown",
    keyword_filter="python",
    output_path="/path/to/output.md"
)

get_scraping_stats

Get current statistics and status

result = await get_scraping_stats()

๐Ÿ—ƒ๏ธ Database Schema

The agent uses SQLite with the following structure:

-- URLs table
CREATE TABLE urls (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    url TEXT UNIQUE NOT NULL,
    title TEXT,
    content TEXT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Keywords table  
CREATE TABLE keywords (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    keyword TEXT UNIQUE NOT NULL
);

-- URL-Keyword relationships
CREATE TABLE url_keywords (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    url_id INTEGER,
    keyword_id INTEGER,
    matches INTEGER DEFAULT 1,
    context TEXT,
    FOREIGN KEY (url_id) REFERENCES urls (id),
    FOREIGN KEY (keyword_id) REFERENCES keywords (id),
    UNIQUE(url_id, keyword_id)
);

๐Ÿ”ง Configuration

Default Settings

  • Max Depth: 3 levels of recursive crawling

  • Request Delay: 1 second between requests

  • User Agent: Modern Chrome browser simulation

  • Database: scraper_results.db (auto-created)

Customization

Modify settings in the MCPWebScraper constructor:

scraper = MCPWebScraper(
    db_manager=db_manager,
    max_depth=5,      # Increase crawl depth
    delay=0.5         # Faster requests
)

๐Ÿงช Development

Running Tests

python test_mcp_scraper.py

Example Usage

python example_usage.py

Project Structure

mcp-web-research-agent/
โ”œโ”€โ”€ server.py              # MCP server implementation
โ”œโ”€โ”€ scraper.py             # Core scraping logic
โ”œโ”€โ”€ database.py            # Database management
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”œโ”€โ”€ pyproject.toml         # Package configuration
โ”œโ”€โ”€ test_mcp_scraper.py    # Unit tests
โ”œโ”€โ”€ example_usage.py       # Usage examples
โ””โ”€โ”€ README.md              # This file

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository

  2. Create your feature branch (git checkout -b feature/amazing-feature)

  3. Commit your changes (git commit -m 'Add some amazing feature')

  4. Push to the branch (git push origin feature/amazing-feature)

  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built on the Model Context Protocol

  • Inspired by modern web scraping best practices

  • Thanks to the open-source community for amazing tools


Built with โค๏ธ for the MCP ecosystem

-
security - not tested
A
license - permissive license
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SnotacusNexus/mcp-web-research-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server