Skip to main content
Glama
README.mdβ€’15.8 kB
# 🌐 Website to Markdown MCP Server <div align="center"> **Language**: [English](README.md) | [繁體中文](README.zh-TW.md) </div> > A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information. ## ✨ Key Features <div align="center"> | 🌟 Enhanced Processing | πŸ“Š OpenAPI Support | βš™οΈ Smart Analysis | 🎯 Advanced Extraction | |:--------:|:-------------:|:----------:|:----------:| | AI-powered content cleanup | OpenAPI 3.x/Swagger 2.0 | Reading time calculation | Main content detection | | Auto ad removal | Professional validation | Word count statistics | Language detection | | Content summarization | Structured API parsing | Smart retry mechanism | Multi-format support | </div> --- ## πŸ†• What's New in v1.2.0 <div align="center"> ### πŸš€ Major Enhancements </div> | Feature | Status | Description | |:-----|:----------:|:-----| | 🧠 **Enhanced Content Processor** | βœ… | AI-powered content cleaning and extraction | | πŸ“Š **Smart Analytics** | βœ… | Word count, reading time, content summary | | 🌍 **Language Detection** | βœ… | Automatic language identification | | 🎯 **Intelligent Retry** | βœ… | Smart retry mechanism with exponential backoff | | πŸ” **Stealth Browser** | βœ… | Anti-detection browsing capabilities | | ⚑ **Rate Limiting** | βœ… | Built-in rate limiting and concurrency control | | 🧹 **Content Cleanup** | βœ… | Remove ads, navigation, and irrelevant content | | πŸ“ **Enhanced Markdown** | βœ… | Support for strikethrough, underline, highlights | --- ## πŸš€ Quick Start ### 🎯 Method 1: NPX Installation (🌟 Recommended) > πŸ’‘ **Easiest way**: No local installation needed! #### **Step 1**: Create Configuration File πŸ“„ Create a `my-websites.json` file: ```json { "websites": [ { "name": "your_website", "url": "https://your-website.com", "description": "Your Project Website" }, { "name": "api_docs", "url": "https://api.example.com/openapi.json", "description": "Your API Specification" } ] } ``` #### **Step 2**: Configure MCP Server βš™οΈ Add to `.cursor/mcp.json`: ```json { "mcpServers": { "website-to-markdown": { "command": "npx", "args": ["-y", "website-to-markdown-mcp"], "disabled": false, "env": { "WEBSITES_CONFIG_PATH": "./my-websites.json" } } } } ``` #### **Step 3**: Restart and Test πŸ”„ 1. **Restart Cursor** 2. **Open Chat and use Agent mode** 3. **Test command**: `Please list all configured websites` <div align="center"> **πŸŽ‰ Done! No installation required!** </div> --- ### 🎯 Method 2: Local Installation > πŸ’‘ **Best Practice**: Use this method for development or customization! #### **Step 1**: Clone and Build ```bash git clone https://github.com/your-username/website-to-markdown-mcp.git cd website-to-markdown-mcp npm install npm run build ``` #### **Step 2**: Configure MCP Server Add to `.cursor/mcp.json`: ```json { "mcpServers": { "website-to-markdown": { "command": "cmd", "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"], "disabled": false, "env": { "WEBSITES_CONFIG_PATH": "./my-websites.json" } } } } ``` --- ## πŸ”₯ Enhanced Output Features ### πŸ“Š Rich Content Analysis Every fetched content now includes: - **πŸ“ Content Summary**: AI-generated summary of the main content - **⏱️ Reading Time**: Estimated reading time based on content length - **πŸ”’ Word Count**: Accurate word count for both English and Chinese - **🌍 Language Detection**: Automatic language identification - **🎯 Content Quality Score**: Assessment of content relevance ### πŸ“‹ Enhanced Markdown Output ```markdown # πŸš€ Example Website **Source**: https://example.com **Website**: example_site - Example Website **πŸ“Š Reading Time**: 5 minutes **πŸ”’ Word Count**: 1,250 words **🌍 Language**: English **πŸ“ Summary**: This article discusses the latest developments in web technology... --- [Enhanced Markdown content with better formatting...] ``` --- ## πŸ†• Complete OpenAPI/Swagger Support <div align="center"> ### πŸ”₯ Professional API Documentation </div> | Feature | OpenAPI 3.x | Swagger 2.0 | Description | |:-----|:----------:|:-----------:|:-----| | πŸ” **Auto Detection** | βœ… | βœ… | Support JSON/YAML formats | | βœ… **Professional Validation** | βœ… | βœ… | Using `@readme/openapi-parser` | | πŸ“‹ **Structured Parsing** | βœ… | βœ… | Endpoints, parameters, responses | | πŸ”— **Reference Resolution** | βœ… | βœ… | Auto handle `$ref` references | | πŸ“Š **Smart Summary** | βœ… | βœ… | Generate API overview | | πŸ“ **Formatted Output** | βœ… | βœ… | Readable Markdown | ### 🌟 Pre-configured Example Websites ```json { "websites": [ { "name": "petstore_openapi", "url": "https://petstore3.swagger.io/api/v3/openapi.json", "description": "πŸ• Swagger Petstore OpenAPI 3.0 Spec (Demo)" }, { "name": "petstore_swagger", "url": "https://petstore.swagger.io/v2/swagger.json", "description": "🐱 Swagger Petstore Swagger 2.0 Spec (Demo)" }, { "name": "github_api", "url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json", "description": "πŸ™ GitHub REST API OpenAPI Spec" } ] } ``` --- ## πŸ“¦ Installation & Setup ### πŸ› οΈ System Requirements - **Node.js** 20.18.1+ (Recommended: v22.15.0 LTS) - **npm** 10.0.0+ or **yarn** - **Cursor** Editor > ⚠️ **Important**: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings. ### ⚑ NPM Package Installation ```bash # Global installation npm install -g website-to-markdown-mcp # Or use directly with npx (recommended) npx website-to-markdown-mcp ``` ### πŸ”§ Development Setup ```bash # 1. Clone repository git clone https://github.com/your-username/website-to-markdown-mcp.git cd website-to-markdown-mcp # 2. Install dependencies npm install # 3. Build project npm run build ``` ### πŸŽ›οΈ Advanced Configuration Options <div align="center"> #### Configuration Priority Order </div> ```mermaid graph TD A[πŸ” Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?} B -->|Yes| C[βœ… Load External Config File] B -->|No| D[πŸ” Check Environment Variable<br/>WEBSITES_CONFIG] D --> E{Valid JSON?} E -->|Yes| F[βœ… Load Embedded Config] E -->|No| G[πŸ” Check config.json] G --> H{File exists?} H -->|Yes| I[βœ… Load Local Config] H -->|No| J[πŸ”§ Use Default Config] ``` --- ## 🎨 Configuration Method Details ### πŸ“‹ Method 1: External Configuration File (🌟 Recommended) > πŸ’‘ **Advantages**: Easy to edit, syntax highlighting, version control friendly <details> <summary><b>πŸ”§ Detailed Setup Steps</b></summary> 1. **Create Configuration File** ```bash # Can be placed anywhere touch my-api-configs.json ``` 2. **Edit Configuration Content** ```json { "websites": [ { "name": "my_docs", "url": "https://docs.example.com", "description": "πŸ“š My Documentation Website" } ] } ``` 3. **Set Environment Variable** ```json { "env": { "WEBSITES_CONFIG_PATH": "./my-api-configs.json" } } ``` </details> ### πŸ“‹ Method 2: Embedded JSON (Backward Compatible) <details> <summary><b>πŸ”§ Configuration Example</b></summary> ```json { "mcpServers": { "website-to-markdown": { "command": "cmd", "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"], "disabled": false, "env": { "WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}" } } } } ``` </details> ### πŸ“‹ Method 3: Local config.json <details> <summary><b>πŸ”§ Local Configuration</b></summary> Directly edit `config.json` in the project root directory: ```json { "websites": [ { "name": "local_site", "url": "https://local.example.com", "description": "🏠 Local Test Website" } ] } ``` </details> --- ## πŸ”§ Available Tools ### 🌐 General Tools | Tool Name | Function | Parameters | Example | |:--------|:-----|:-----|:-----| | `fetch_website` | Fetch any website | `url`: Website URL | Fetch OpenAPI spec files | | `list_configured_websites` | List configured websites | None | View all available websites | ### 🎯 Dedicated Tools Each configured website automatically generates corresponding dedicated tools: - `fetch_petstore_openapi` - Fetch Petstore OpenAPI 3.0 spec - `fetch_petstore_swagger` - Fetch Petstore Swagger 2.0 spec - `fetch_github_api` - Fetch GitHub API spec - `fetch_tailwind_css` - Fetch Tailwind CSS documentation --- ## πŸ“Š Enhanced Output Format Examples ### 🌐 General Website Content with Analytics ```markdown # Website Title **Source**: https://example.com **Website**: example_site - Example Website **πŸ“Š Reading Time**: 3 minutes **πŸ”’ Word Count**: 650 words **🌍 Language**: English **πŸ“ Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies. --- [Enhanced cleaned Markdown content with ads removed and main content extracted...] ``` ### πŸ“‹ OpenAPI 3.x Specification File ```markdown # πŸš€ Example API (v2.1.0) **Source**: https://api.example.com/openapi.json **OpenAPI Version**: 3.0.3 **Validation Status**: βœ… Valid **πŸ“Š Processing Time**: 1.2 seconds **πŸ”’ Endpoints**: 25 endpoints **🌍 Server Locations**: 3 servers --- ## πŸ“‹ API Basic Information - **API Name**: Example API - **Version**: 2.1.0 - **OpenAPI Version**: 3.0.3 - **Description**: A powerful example API for modern applications ## 🌐 Servers 1. **https://api.example.com** - 🏒 Production server 2. **https://staging-api.example.com** - πŸ§ͺ Testing server ## πŸ› οΈ API Endpoints Total of **25** endpoints: ### πŸ‘₯ `/users` - **GET**: Get user list - **POST**: Create new user ### πŸ” `/users/{id}` - **GET**: Get specific user - **PUT**: Update user information - **DELETE**: Delete user ## 🧩 Components - **Schemas**: 12 data models - **Parameters**: 8 reusable parameters - **Responses**: 15 reusable responses - **Security Schemes**: 3 security mechanisms ``` --- ## 🎯 Usage Examples ### πŸ’» Basic Usage ``` Please fetch the content from https://docs.example.com and convert to markdown ``` ### πŸ” OpenAPI Specification Fetching ``` Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification ``` ### πŸ“š Documentation Website Fetching ``` Please fetch React official documentation content ``` --- ## 🚨 Troubleshooting > πŸ“‹ **Complete Troubleshooting Guide**: See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for detailed solutions to common issues. ### ❓ Quick Solutions <details> <summary><b>πŸ”§ Node.js Version Issues</b></summary> **Error**: `npm WARN EBADENGINE Unsupported engine` - **Solution**: Update Node.js to v20.18.1 or higher - **Download**: [Node.js Official Website](https://nodejs.org/) - **Verify**: `node --version` </details> <details> <summary><b>🌐 Module Not Found Issues</b></summary> **Error**: `Cannot find module './db.json'` - **Solution 1**: Clear npm cache: `npm cache clean --force` - **Solution 2**: Update Node.js version - **Solution 3**: Use local installation instead of npx </details> <details> <summary><b>βš™οΈ Configuration Issues</b></summary> **Q: Configuration changes not taking effect?** - βœ… Confirm JSON format is correct - βœ… Restart Cursor - βœ… Check environment variable names **Q: JSON format errors?** - πŸ› οΈ Use [JSON Validator](https://jsonlint.com/) - πŸ› οΈ Confirm using double quotes - πŸ› οΈ Check for extra commas </details> ### πŸ” Debug Mode Detailed logs are output to stderr at startup: ```bash # View debug messages npm run dev 2> debug.log ``` --- ## πŸ“ˆ Performance & Optimization ### ⚑ Performance Features - πŸš€ **Smart Retry**: Intelligent retry with exponential backoff - πŸ’Ύ **Rate Limiting**: Built-in rate limiting to prevent overload - 🎯 **Content Filtering**: Remove irrelevant content for faster processing - 🧹 **Ad Removal**: Automatic ad and popup removal - πŸ“Š **Stealth Mode**: Anti-detection browsing capabilities ### πŸ›‘οΈ Security Considerations - πŸ”’ HTTPS websites only (recommended) - πŸ› οΈ Auto filter malicious scripts - πŸ“ Limit output content length - πŸ” Stealth browsing to avoid detection --- ## πŸ“¦ Dependencies <div align="center"> | Package | Version | Purpose | |:-----|:----:|:-----| | `@modelcontextprotocol/sdk` | ^1.0.0 | MCP Core Framework | | `@readme/openapi-parser` | ^4.1.0 | Professional OpenAPI Parsing | | `axios` | ^1.6.0 | HTTP Request Handling | | `cheerio` | ^1.0.0 | HTML Parsing Engine | | `turndown` | ^7.1.2 | HTML to Markdown | | `yaml` | ^2.8.0 | YAML Format Support | | `zod` | ^3.22.0 | Data Validation Framework | | `playwright` | ^1.40.0 | Browser automation | </div> --- ## πŸ“ Changelog ### πŸŽ‰ v1.2.0 (Latest) <div align="center"> **πŸš€ Major Feature Updates** </div> - ✨ **Added** Enhanced content processing with AI-powered cleanup - ✨ **Added** Smart analytics: word count, reading time, content summary - ✨ **Added** Language detection and multi-language support - ✨ **Added** Stealth browser capabilities for anti-detection - ✨ **Added** Built-in rate limiting and retry mechanisms - ✨ **Added** Advanced content filtering and ad removal - πŸ”§ **Enhanced** Markdown processing with more HTML element support - πŸ“Š **Improved** Output format with rich metadata - 🎯 **Fixed** Various technical issues and dependencies ### 🎯 v1.1.0 (Previous) <div align="center"> **πŸš€ Major Feature Updates** </div> - ✨ **Added** Full OpenAPI 3.x/Swagger 2.0 support - ✨ **Added** JSON/YAML format auto-detection - ✨ **Added** Professional-grade spec validation and reference resolution - ✨ **Added** Version auto-adaptation mechanism - ✨ **Added** Structured API documentation summary - πŸ”§ **Pre-configured** Multiple OpenAPI/Swagger examples - πŸ“¦ **Added** NPM package distribution with npx support - 🎯 **Enhanced** Installation methods for better user experience ### 🎯 v1.0.0 (Stable) - πŸŽ‰ **Initial Release** - 🌐 **Basic Functions** Website content fetching - πŸ“ **Core Functions** Markdown conversion - βš™οΈ **Configuration Support** Multi-website management --- ## 🀝 Contributing ### πŸ’‘ How to Contribute 1. **🍴 Fork** this project 2. **🌟 Create** feature branch (`git checkout -b feature/AmazingFeature`) 3. **πŸ“ Commit** changes (`git commit -m 'Add some AmazingFeature'`) 4. **πŸ“€ Push** to branch (`git push origin feature/AmazingFeature`) 5. **πŸ”„ Open** Pull Request ### πŸ› Issue Reporting Report issues on the [Issues](https://github.com/your-repo/issues) page, please include: - πŸ” **Issue Description** - πŸ”„ **Reproduction Steps** - πŸ’» **Environment Information** - πŸ“Έ **Screenshots or Logs** --- ## πŸ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- <div align="center"> ### 🌟 If this project helps you, please give it a Star! **πŸ’¬ Have questions or suggestions? Feel free to open an Issue!** --- **Made by Sun** ❀️ **for the Developer Community** </div>

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SunZhi-Will/website-to-markdown-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server