Skip to main content
Glama

MCP WebAnalyzer

README.md8.55 kB
# 🔍 Web Analyzer MCP <a href="https://glama.ai/mcp/servers/@kimdonghwi94/web-analyzer-mcp"> <img width="380" height="200" src="https://glama.ai/mcp/servers/@kimdonghwi94/web-analyzer-mcp/badge" alt="WebAnalyzer MCP server" /> </a> A powerful MCP (Model Context Protocol) server for intelligent web content analysis and summarization. Built with FastMCP, this server provides smart web scraping, content extraction, and AI-powered question-answering capabilities. ## ✨ Features ### 🎯 Core Tools 1. **`url_to_markdown`** - Extract and summarize key web page content - Analyzes content importance using custom algorithms - Removes ads, navigation, and irrelevant content - Keeps only essential information (tables, images, key text) - Outputs structured markdown optimized for analysis 2. **`web_content_qna`** - AI-powered Q&A about web content - Extracts relevant content sections from web pages - Uses intelligent chunking and relevance matching - Answers questions using OpenAI GPT models ### 🚀 Key Features - **Smart Content Ranking**: Algorithm-based content importance scoring - **Essential Content Only**: Removes clutter, keeps what matters - **Multi-IDE Support**: Works with Claude Desktop, Cursor, VS Code, PyCharm - **Flexible Models**: Choose from GPT-3.5, GPT-4, GPT-4 Turbo, or GPT-5 ## 📦 Installation ### Prerequisites - [uv](https://docs.astral.sh/uv/getting-started/installation/) (Python package manager) - Chrome/Chromium browser (for Selenium) - OpenAI API key (for Q&A functionality) ### 🚀 Quick Start with uv (Recommended) ```bash # Clone the repository git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git cd web-analyzer-mcp # Run directly with uv (auto-installs dependencies) uv run mcp-webanalyzer ``` ### Installing via Smithery To install web-analyzer-mcp for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@kimdonghwi94/web-analyzer-mcp): ```bash npx -y @smithery/cli install @kimdonghwi94/web-analyzer-mcp --client claude ``` # IDE/Editor Integration <details> <summary><b>Install Claude Desktop</b></summary> Add to your Claude Desktop_config.json file. See [Claude Desktop MCP documentation](https://modelcontextprotocol.io/quickstart/user) for more details. ```json { "mcpServers": { "web-analyzer": { "command": "uv", "args": [ "--directory", "/path/to/web-analyzer-mcp", "run", "mcp-webanalyzer" ], "env": { "OPENAI_API_KEY": "your_openai_api_key_here", "OPENAI_MODEL": "gpt-4" } } } } ``` </details> <details> <summary><b>Install Claude Code (VS Code Extension)</b></summary> Add the server using Claude Code CLI: ```bash claude mcp add web-analyzer -e OPENAI_API_KEY=your_api_key_here -e OPENAI_MODEL=gpt-4 -- uv --directory /path/to/web-analyzer-mcp run mcp-webanalyzer ``` </details> <details> <summary><b>Install Cursor IDE</b></summary> Add to your Cursor settings (`File > Preferences > Settings > Extensions > MCP`): ```json { "mcpServers": { "web-analyzer": { "command": "uv", "args": [ "--directory", "/path/to/web-analyzer-mcp", "run", "mcp-webanalyzer" ], "env": { "OPENAI_API_KEY": "your_openai_api_key_here", "OPENAI_MODEL": "gpt-4" } } } } ``` </details> <details> <summary><b>Install JetBrains AI Assistant</b></summary> See [JetBrains AI Assistant Documentation](https://www.jetbrains.com/help/idea/ai-assistant.html) for more details. 1. In JetBrains IDEs go to **Settings** → **Tools** → **AI Assistant** → **Model Context Protocol (MCP)** 2. Click **+ Add** 3. Click on **Command** in the top-left corner of the dialog and select the **As JSON** option from the list 4. Add this configuration and click **OK**: ```json { "mcpServers": { "web-analyzer": { "command": "uv", "args": [ "--directory", "/path/to/web-analyzer-mcp", "run", "mcp-webanalyzer" ], "env": { "OPENAI_API_KEY": "your_openai_api_key_here", "OPENAI_MODEL": "gpt-4" } } } } ``` </details> ## 🎛️ Tool Descriptions ### `url_to_markdown` Converts web pages to clean markdown format with essential content extraction. **Parameters:** - `url` (string): The web page URL to analyze **Returns:** Clean markdown content with structured data preservation ### `web_content_qna` Answers questions about web page content using intelligent content analysis. **Parameters:** - `url` (string): The web page URL to analyze - `question` (string): Question about the page content **Returns:** AI-generated answer based on page content ## 🏗️ Architecture ### Content Extraction Pipeline 1. **URL Validation** - Ensures proper URL format 2. **HTML Fetching** - Uses Selenium for dynamic content 3. **Content Parsing** - BeautifulSoup for HTML processing 4. **Element Scoring** - Custom algorithm ranks content importance 5. **Content Filtering** - Removes duplicates and low-value content 6. **Markdown Conversion** - Structured output generation ### Q&A Processing Pipeline 1. **Content Chunking** - Intelligent text segmentation 2. **Relevance Scoring** - Matches content to questions 3. **Context Selection** - Picks most relevant chunks 4. **Answer Generation** - OpenAI GPT integration ## 🏗️ Project Structure ``` web-analyzer-mcp/ ├── web_analyzer_mcp/ # Main Python package │ ├── __init__.py # Package initialization │ ├── server.py # FastMCP server with tools │ ├── web_extractor.py # Web content extraction engine │ └── rag_processor.py # RAG-based Q&A processor ├── scripts/ # Build and utility scripts │ └── build.js # Node.js build script ├── README.md # English documentation ├── README.ko.md # Korean documentation ├── package.json # npm configuration and scripts ├── pyproject.toml # Python package configuration ├── .env.example # Environment variables template └── dist-info.json # Build information (generated) ``` ## 🛠️ Development ### Modern Development with uv ```bash # Clone repository git clone https://github.com/kimdonghwi94/web-analyzer-mcp.git cd web-analyzer-mcp # Development commands uv run mcp-webanalyzer # Start development server uv run python -m pytest # Run tests uv run ruff check . # Lint code uv run ruff format . # Format code uv sync # Sync dependencies # Install development dependencies uv add --dev pytest ruff mypy # Create production build npm run build ``` ### Alternative: Traditional Python Development ```bash # Setup Python environment (if not using uv) pip install -e .[dev] # Development commands python -m web_analyzer_mcp.server # Start server python -m pytest tests/ # Run tests python -m ruff check . # Lint code python -m ruff format . # Format code python -m mypy web_analyzer_mcp/ # Type checking ``` ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## 📋 Roadmap - [ ] Support for more content types (PDFs, videos) - [ ] Multi-language content extraction - [ ] Custom extraction rules - [ ] Caching for frequently accessed content - [ ] Webhook support for real-time updates ## ⚠️ Limitations - Requires Chrome/Chromium for JavaScript-heavy sites - OpenAI API key needed for Q&A functionality - Rate limited to prevent abuse - Some sites may block automated access ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🙋‍♂️ Support - Create an issue for bug reports or feature requests - Contribute to discussions in the GitHub repository - Check the [documentation](https://github.com/kimdonghwi94/web-analyzer-mcp) for detailed guides ## 🌟 Acknowledgments - Built with [FastMCP](https://github.com/jlowin/fastmcp) framework - Inspired by [HTMLRAG](https://github.com/plageon/HtmlRAG) techniques for web content processing - Thanks to the MCP community for feedback and contributions --- **Made with ❤️ for the MCP community**

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kimdonghwi94/web-analyzer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server