MCP Web Research Agent

A powerful MCP (Model Context Protocol) tool for automated web research, scraping, and intelligence gathering.

License: MIT Python 3.8+ MCP Protocol

A sophisticated web research automation tool that converts your existing scraper into an MCP-compatible agent for enhanced AI workflows. Perfect for competitive intelligence, market research, and automated data collection.

🚀 Features

🔍 Intelligent Scraping: Recursive web crawling with configurable depth
🔎 Search Integration: Multi-engine search with result processing
💾 Database Storage: Persistent SQLite storage with advanced querying
📊 Multiple Export Formats: JSON, Markdown, and CSV exports
🤖 MCP Integration: Seamless integration with AI assistants
⚡ Async Ready: Built for concurrent operations
🔧 Configurable: Adjustable settings for any use case

🛠️ Installation

Prerequisites

Python 3.8+
MCP-compatible client (Claude Desktop, etc.)

Quick Install

# Clone the repository git clone https://github.com/yourusername/mcp-web-research-agent.git cd mcp-web-research-agent # Install dependencies pip install -e .

MCP Client Configuration

Add to your MCP client configuration:

{ "mcpServers": { "web-research-agent": { "command": "python", "args": ["/path/to/mcp-web-research-agent/server.py"] } } }

📖 Usage

Available Tools

`scrape_url`

Scrape a single URL for specific keywords

result = await scrape_url( url="https://example.com", keywords=["python", "automation", "scraping"], extract_links=False, max_depth=1 )

`search_and_scrape`

Search the web and automatically scrape results

result = await search_and_scrape( query="web scraping best practices", keywords=["python", "beautifulsoup", "requests"], search_engine_url="https://searx.gophernuttz.us/search/", max_results=10 )

`get_scraping_results`

Query the database for previous scraping results

result = await get_scraping_results( keyword_filter="python", limit=50 )

`export_results`

Export results to various formats

result = await export_results( format="markdown", keyword_filter="python", output_path="/path/to/output.md" )

`get_scraping_stats`

Get current statistics and status

result = await get_scraping_stats()

🗃️ Database Schema

The agent uses SQLite with the following structure:

-- URLs table CREATE TABLE urls ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT UNIQUE NOT NULL, title TEXT, content TEXT, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); -- Keywords table CREATE TABLE keywords ( id INTEGER PRIMARY KEY AUTOINCREMENT, keyword TEXT UNIQUE NOT NULL ); -- URL-Keyword relationships CREATE TABLE url_keywords ( id INTEGER PRIMARY KEY AUTOINCREMENT, url_id INTEGER, keyword_id INTEGER, matches INTEGER DEFAULT 1, context TEXT, FOREIGN KEY (url_id) REFERENCES urls (id), FOREIGN KEY (keyword_id) REFERENCES keywords (id), UNIQUE(url_id, keyword_id) );

🔧 Configuration

Default Settings

Max Depth: 3 levels of recursive crawling
Request Delay: 1 second between requests
User Agent: Modern Chrome browser simulation
Database: scraper_results.db (auto-created)

Customization

Modify settings in the MCPWebScraper constructor:

scraper = MCPWebScraper( db_manager=db_manager, max_depth=5, # Increase crawl depth delay=0.5 # Faster requests )

🧪 Development

Running Tests

python test_mcp_scraper.py

Example Usage

python example_usage.py

Project Structure

mcp-web-research-agent/ ├── server.py # MCP server implementation ├── scraper.py # Core scraping logic ├── database.py # Database management ├── requirements.txt # Python dependencies ├── pyproject.toml # Package configuration ├── test_mcp_scraper.py # Unit tests ├── example_usage.py # Usage examples └── README.md # This file

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built on the Model Context Protocol
Inspired by modern web scraping best practices
Thanks to the open-source community for amazing tools

Built with ❤️ for the MCP ecosystem

MCP Web Research Agent

MCP Web Research Agent

🚀 Features

🛠️ Installation

Prerequisites

Quick Install

MCP Client Configuration

📖 Usage

Available Tools

`scrape_url`

`search_and_scrape`

`get_scraping_results`

`export_results`

`get_scraping_stats`

🗃️ Database Schema

🔧 Configuration

Default Settings

Customization

🧪 Development

Running Tests

Example Usage

Project Structure

🤝 Contributing

📄 License

🙏 Acknowledgments

Resources

New MCP Servers

Latest Blog Posts

MCP directory API