MCP Web Research Agent
A powerful MCP (Model Context Protocol) tool for automated web research, scraping, and intelligence gathering.
A sophisticated web research automation tool that converts your existing scraper into an MCP-compatible agent for enhanced AI workflows. Perfect for competitive intelligence, market research, and automated data collection.
π Features
π Intelligent Scraping: Recursive web crawling with configurable depth
π Search Integration: Multi-engine search with result processing
πΎ Database Storage: Persistent SQLite storage with advanced querying
π Multiple Export Formats: JSON, Markdown, and CSV exports
π€ MCP Integration: Seamless integration with AI assistants
β‘ Async Ready: Built for concurrent operations
π§ Configurable: Adjustable settings for any use case
π οΈ Installation
Prerequisites
Python 3.8+
MCP-compatible client (Claude Desktop, etc.)
Quick Install
MCP Client Configuration
Add to your MCP client configuration:
π Usage
Available Tools
scrape_url
Scrape a single URL for specific keywords
search_and_scrape
Search the web and automatically scrape results
get_scraping_results
Query the database for previous scraping results
export_results
Export results to various formats
get_scraping_stats
Get current statistics and status
ποΈ Database Schema
The agent uses SQLite with the following structure:
π§ Configuration
Default Settings
Max Depth: 3 levels of recursive crawling
Request Delay: 1 second between requests
User Agent: Modern Chrome browser simulation
Database:
scraper_results.db(auto-created)
Customization
Modify settings in the MCPWebScraper constructor:
π§ͺ Development
Running Tests
Example Usage
Project Structure
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Fork the repository
Create your feature branch (
git checkout -b feature/amazing-feature)Commit your changes (
git commit -m 'Add some amazing feature')Push to the branch (
git push origin feature/amazing-feature)Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
Built on the Model Context Protocol
Inspired by modern web scraping best practices
Thanks to the open-source community for amazing tools
Built with β€οΈ for the MCP ecosystem