The MCP WebAnalyzer server enables intelligent web content analysis and processing with AI-powered capabilities:
• Content Extraction & Conversion - Extract web pages to clean, structured markdown while removing ads, navigation, and clutter • AI-Powered Q&A - Answer questions about web content using OpenAI GPT models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-5) • Web Crawling - Discover subpages from URLs with configurable depth and page limits • Content Summarization - Generate concise one-line summaries and RAG-optimized extracts with optional question focusing • Smart Content Ranking - Analyze and rank content importance using custom algorithms • IDE Integration - Works with Claude Desktop, Cursor, VS Code, and PyCharm
Uses Celery for distributed asynchronous task processing and background job management
Supports containerized deployment and orchestration of the web analyzer services
Uses environment variable configuration for server settings, security keys, and connection information
Built on FastAPI to provide high-performance web analysis endpoints and API documentation
Integration with OpenAI's API for AI-powered web content analysis and summarization
Exposes metrics endpoints for monitoring server performance and usage statistics
Uses Redis for caching, session management, and as a message broker for task processing
🔍 Web Analyzer MCP
A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities.
✨ Features
🎯 Core Tools
url_to_markdown
- Extract and summarize key web page content- Analyzes content importance using custom algorithms
- Removes ads, navigation, and irrelevant content
- Keeps only essential information (tables, images, key text)
- Outputs structured markdown optimized for analysis
web_content_qna
- AI-powered Q&A about web content- Extracts relevant content sections from web pages
- Uses intelligent chunking and relevance matching
- Answers questions using OpenAI GPT models
🚀 Key Features
- Smart Content Ranking: Algorithm-based content importance scoring
- Essential Content Only: Removes clutter, keeps what matters
- Multi-IDE Support: Works with Claude Desktop, Cursor, VS Code, PyCharm
- Flexible Models: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5
📦 Installation
Prerequisites
- uv (Python package manager)
- Chrome/Chromium browser (for Selenium)
- OpenAI API key (for Q&A functionality)
🚀 Quick Start with uv (Recommended)
Installing via Smithery
To install web-analyzer-mcp for Claude Desktop automatically via Smithery:
IDE/Editor Integration
Add to your Claude Desktop_config.json file. See Claude Desktop MCP documentation for more details.
Add the server using Claude Code CLI:
Add to your Cursor settings (File > Preferences > Settings > Extensions > MCP
):
See JetBrains AI Assistant Documentation for more details.
- In JetBrains IDEs go to Settings → Tools → AI Assistant → Model Context Protocol (MCP)
- Click + Add
- Click on Command in the top-left corner of the dialog and select the As JSON option from the list
- Add this configuration and click OK:
🎛️ Tool Descriptions
url_to_markdown
Converts web pages to clean markdown format with essential content extraction.
Parameters:
url
(string): The web page URL to analyze
Returns: Clean markdown content with structured data preservation
web_content_qna
Answers questions about web page content using intelligent content analysis.
Parameters:
url
(string): The web page URL to analyzequestion
(string): Question about the page content
Returns: AI-generated answer based on page content
🏗️ Architecture
Content Extraction Pipeline
- URL Validation - Ensures proper URL format
- HTML Fetching - Uses Selenium for dynamic content
- Content Parsing - BeautifulSoup for HTML processing
- Element Scoring - Custom algorithm ranks content importance
- Content Filtering - Removes duplicates and low-value content
- Markdown Conversion - Structured output generation
Q&A Processing Pipeline
- Content Chunking - Intelligent text segmentation
- Relevance Scoring - Matches content to questions
- Context Selection - Picks most relevant chunks
- Answer Generation - OpenAI GPT integration
🏗️ Project Structure
🛠️ Development
Modern Development with uv
Alternative: Traditional Python Development
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
📋 Roadmap
- Support for more content types (PDFs, videos)
- Multi-language content extraction
- Custom extraction rules
- Caching for frequently accessed content
- Webhook support for real-time updates
⚠️ Limitations
- Requires Chrome/Chromium for JavaScript-heavy sites
- OpenAI API key needed for Q&A functionality
- Rate limited to prevent abuse
- Some sites may block automated access
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙋♂️ Support
- Create an issue for bug reports or feature requests
- Contribute to discussions in the GitHub repository
- Check the documentation for detailed guides
🌟 Acknowledgments
- Built with FastMCP framework
- Inspired by HTMLRAG techniques for web content processing
- Thanks to the MCP community for feedback and contributions
Made with ❤️ for the MCP community
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
An enterprise-grade Model Context Protocol server for high-performance web analysis that discovers subpages, provides AI-based page summaries, and extracts structured content for RAG using FastMCP and FastAPI.
Related MCP Servers
- AsecurityAlicenseAqualityA production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.Last updated -536PythonMIT License
- AsecurityAlicenseAqualityA Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.Last updated -46162TypeScriptMIT License
- AsecurityFlicenseAqualityA Model Context Protocol server that intelligently fetches and processes web content, transforming websites and documentation into clean, structured markdown with nested URL crawling capabilities.Last updated -26143TypeScript
- AsecurityFlicenseAqualityA comprehensive Model Context Protocol server for content summarization that supports web scraping, file reading, content summarization, and topic-based summarization features.Last updated -711Python