What can you do with this server?

The MCP WebAnalyzer server enables intelligent web content analysis and processing with AI-powered capabilities: • Content Extraction & Conversion - Extract web pages to clean, structured markdown while removing ads, navigation, and clutter • AI-Powered Q\&A - Answer questions about web content using OpenAI GPT models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-5) • Web Crawling - Discover subpages from URLs with configurable depth and page limits • Content Summarization - Generate concise one-line summaries and RAG-optimized extracts with optional question focusing • Smart Content Ranking - Analyze and rank content importance using custom algorithms • IDE Integration - Works with Claude Desktop, Cursor, VS Code, and PyCharm

Which integrations are available for this server?

Uses Celery for distributed asynchronous task processing and background job management Supports containerized deployment and orchestration of the web analyzer services Uses environment variable configuration for server settings, security keys, and connection information Built on FastAPI to provide high-performance web analysis endpoints and API documentation Integration with OpenAI's API for AI-powered web content analysis and summarization Exposes metrics endpoints for monitoring server performance and usage statistics Uses Redis for caching, session management, and as a message broker for task processing

How do I use MCP WebAnalyzer?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP WebAnalyzer summarize the key points from https://example.com/blog/post" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

🔍 Web Analyzer MCP

A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities.

✨ Features

🎯 Core Tools

url_to_markdown - Extract and summarize key web page content
- Analyzes content importance using custom algorithms
- Removes ads, navigation, and irrelevant content
- Keeps only essential information (tables, images, key text)
- Outputs structured markdown optimized for analysis
web_content_qna - AI-powered Q&A about web content
- Extracts relevant content sections from web pages
- Uses intelligent chunking and relevance matching
- Answers questions using OpenAI GPT models

🚀 Key Features

Smart Content Ranking: Algorithm-based content importance scoring
Essential Content Only: Removes clutter, keeps what matters
Multi-IDE Support: Works with Claude Desktop, Cursor, VS Code, PyCharm
Flexible Models: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5

Related MCP server: Prysm MCP Server

📦 Installation

Prerequisites

uv (Python package manager)
Chrome/Chromium browser (for Selenium)
OpenAI API key (for Q&A functionality)

🚀 Quick Start with uv (Recommended)

# Clone the repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Run directly with uv (auto-installs dependencies)
uv run mcp-webanalyzer

Installing via Smithery

To install web-analyzer-mcp for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude

IDE/Editor Integration

Add to your Claude Desktop_config.json file. See Claude Desktop MCP documentation for more details.

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

Add the server using Claude Code CLI:

claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer

Add to your Cursor settings (File > Preferences > Settings > Extensions > MCP):

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

See JetBrains AI Assistant Documentation for more details.

In JetBrains IDEs go to Settings → Tools → AI Assistant → Model Context Protocol (MCP)
Click + Add
Click on Command in the top-left corner of the dialog and select the As JSON option from the list
Add this configuration and click OK:

{
  "mcpServers": {
    "web-analyzer": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/web-analyzer-mcp",
        "run", 
        "mcp-webanalyzer"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "OPENAI_MODEL": "gpt-4"
      }
    }
  }
}

🎛️ Tool Descriptions

`url_to_markdown`

Converts web pages to clean markdown format with essential content extraction.

Parameters:

url (string): The web page URL to analyze

Returns: Clean markdown content with structured data preservation

`web_content_qna`

Answers questions about web page content using intelligent content analysis.

Parameters:

url (string): The web page URL to analyze
question (string): Question about the page content

Returns: AI-generated answer based on page content

🏗️ Architecture

Content Extraction Pipeline

URL Validation - Ensures proper URL format
HTML Fetching - Uses Selenium for dynamic content
Content Parsing - BeautifulSoup for HTML processing
Element Scoring - Custom algorithm ranks content importance
Content Filtering - Removes duplicates and low-value content
Markdown Conversion - Structured output generation

Q&A Processing Pipeline

Content Chunking - Intelligent text segmentation
Relevance Scoring - Matches content to questions
Context Selection - Picks most relevant chunks
Answer Generation - OpenAI GPT integration

🏗️ Project Structure

web-analyzer-mcp/
├── web_analyzer_mcp/          # Main Python package
│   ├── __init__.py           # Package initialization
│   ├── server.py             # FastMCP server with tools
│   ├── web_extractor.py      # Web content extraction engine
│   └── rag_processor.py      # RAG-based Q&A processor
├── scripts/                   # Build and utility scripts
│   └── build.js              # Node.js build script
├── README.md                 # English documentation
├── README.ko.md              # Korean documentation
├── package.json              # npm configuration and scripts
├── pyproject.toml            # Python package configuration
├── .env.example              # Environment variables template
└── dist-info.json            # Build information (generated)

🛠️ Development

Modern Development with uv

# Clone repository
git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git
cd web-analyzer-mcp

# Development commands
uv run mcp-webanalyzer     # Start development server
uv run python -m pytest   # Run tests
uv run ruff check .        # Lint code
uv run ruff format .       # Format code
uv sync                    # Sync dependencies

# Install development dependencies
uv add --dev pytest ruff mypy

# Create production build
npm run build

Alternative: Traditional Python Development

# Setup Python environment (if not using uv)
pip install -e .[dev]

# Development commands
python -m web_analyzer_mcp.server  # Start server
python -m pytest tests/            # Run tests
python -m ruff check .             # Lint code
python -m ruff format .            # Format code
python -m mypy web_analyzer_mcp/   # Type checking

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📋 Roadmap

Support for more content types (PDFs, videos)
Multi-language content extraction
Custom extraction rules
Caching for frequently accessed content
Webhook support for real-time updates

⚠️ Limitations

Requires Chrome/Chromium for JavaScript-heavy sites
OpenAI API key needed for Q&A functionality
Rate limited to prevent abuse
Some sites may block automated access

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ Support

Create an issue for bug reports or feature requests
Contribute to discussions in the GitHub repository
Check the documentation for detailed guides

🌟 Acknowledgments

Built with FastMCP framework
Inspired by HTMLRAG techniques for web content processing
Thanks to the MCP community for feedback and contributions

Made with ❤️ for the MCP community

MCP WebAnalyzer