# π Website to Markdown MCP Server
<div align="center">
**Language**: [English](README.md) | [ηΉι«δΈζ](README.zh-TW.md)
</div>
> A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information.
## β¨ Key Features
<div align="center">
| π Enhanced Processing | π OpenAPI Support | βοΈ Smart Analysis | π― Advanced Extraction |
|:--------:|:-------------:|:----------:|:----------:|
| AI-powered content cleanup | OpenAPI 3.x/Swagger 2.0 | Reading time calculation | Main content detection |
| Auto ad removal | Professional validation | Word count statistics | Language detection |
| Content summarization | Structured API parsing | Smart retry mechanism | Multi-format support |
</div>
---
## π What's New in v1.2.0
<div align="center">
### π Major Enhancements
</div>
| Feature | Status | Description |
|:-----|:----------:|:-----|
| π§ **Enhanced Content Processor** | β
| AI-powered content cleaning and extraction |
| π **Smart Analytics** | β
| Word count, reading time, content summary |
| π **Language Detection** | β
| Automatic language identification |
| π― **Intelligent Retry** | β
| Smart retry mechanism with exponential backoff |
| π **Stealth Browser** | β
| Anti-detection browsing capabilities |
| β‘ **Rate Limiting** | β
| Built-in rate limiting and concurrency control |
| π§Ή **Content Cleanup** | β
| Remove ads, navigation, and irrelevant content |
| π **Enhanced Markdown** | β
| Support for strikethrough, underline, highlights |
---
## π Quick Start
### π― Method 1: NPX Installation (π Recommended)
> π‘ **Easiest way**: No local installation needed!
#### **Step 1**: Create Configuration File π
Create a `my-websites.json` file:
```json
{
"websites": [
{
"name": "your_website",
"url": "https://your-website.com",
"description": "Your Project Website"
},
{
"name": "api_docs",
"url": "https://api.example.com/openapi.json",
"description": "Your API Specification"
}
]
}
```
#### **Step 2**: Configure MCP Server βοΈ
Add to `.cursor/mcp.json`:
```json
{
"mcpServers": {
"website-to-markdown": {
"command": "npx",
"args": ["-y", "website-to-markdown-mcp"],
"disabled": false,
"env": {
"WEBSITES_CONFIG_PATH": "./my-websites.json"
}
}
}
}
```
#### **Step 3**: Restart and Test π
1. **Restart Cursor**
2. **Open Chat and use Agent mode**
3. **Test command**: `Please list all configured websites`
<div align="center">
**π Done! No installation required!**
</div>
---
### π― Method 2: Local Installation
> π‘ **Best Practice**: Use this method for development or customization!
#### **Step 1**: Clone and Build
```bash
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
npm install
npm run build
```
#### **Step 2**: Configure MCP Server
Add to `.cursor/mcp.json`:
```json
{
"mcpServers": {
"website-to-markdown": {
"command": "cmd",
"args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
"disabled": false,
"env": {
"WEBSITES_CONFIG_PATH": "./my-websites.json"
}
}
}
}
```
---
## π₯ Enhanced Output Features
### π Rich Content Analysis
Every fetched content now includes:
- **π Content Summary**: AI-generated summary of the main content
- **β±οΈ Reading Time**: Estimated reading time based on content length
- **π’ Word Count**: Accurate word count for both English and Chinese
- **π Language Detection**: Automatic language identification
- **π― Content Quality Score**: Assessment of content relevance
### π Enhanced Markdown Output
```markdown
# π Example Website
**Source**: https://example.com
**Website**: example_site - Example Website
**π Reading Time**: 5 minutes
**π’ Word Count**: 1,250 words
**π Language**: English
**π Summary**: This article discusses the latest developments in web technology...
---
[Enhanced Markdown content with better formatting...]
```
---
## π Complete OpenAPI/Swagger Support
<div align="center">
### π₯ Professional API Documentation
</div>
| Feature | OpenAPI 3.x | Swagger 2.0 | Description |
|:-----|:----------:|:-----------:|:-----|
| π **Auto Detection** | β
| β
| Support JSON/YAML formats |
| β
**Professional Validation** | β
| β
| Using `@readme/openapi-parser` |
| π **Structured Parsing** | β
| β
| Endpoints, parameters, responses |
| π **Reference Resolution** | β
| β
| Auto handle `$ref` references |
| π **Smart Summary** | β
| β
| Generate API overview |
| π **Formatted Output** | β
| β
| Readable Markdown |
### π Pre-configured Example Websites
```json
{
"websites": [
{
"name": "petstore_openapi",
"url": "https://petstore3.swagger.io/api/v3/openapi.json",
"description": "π Swagger Petstore OpenAPI 3.0 Spec (Demo)"
},
{
"name": "petstore_swagger",
"url": "https://petstore.swagger.io/v2/swagger.json",
"description": "π± Swagger Petstore Swagger 2.0 Spec (Demo)"
},
{
"name": "github_api",
"url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
"description": "π GitHub REST API OpenAPI Spec"
}
]
}
```
---
## π¦ Installation & Setup
### π οΈ System Requirements
- **Node.js** 20.18.1+ (Recommended: v22.15.0 LTS)
- **npm** 10.0.0+ or **yarn**
- **Cursor** Editor
> β οΈ **Important**: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings.
### β‘ NPM Package Installation
```bash
# Global installation
npm install -g website-to-markdown-mcp
# Or use directly with npx (recommended)
npx website-to-markdown-mcp
```
### π§ Development Setup
```bash
# 1. Clone repository
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
# 2. Install dependencies
npm install
# 3. Build project
npm run build
```
### ποΈ Advanced Configuration Options
<div align="center">
#### Configuration Priority Order
</div>
```mermaid
graph TD
A[π Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?}
B -->|Yes| C[β
Load External Config File]
B -->|No| D[π Check Environment Variable<br/>WEBSITES_CONFIG]
D --> E{Valid JSON?}
E -->|Yes| F[β
Load Embedded Config]
E -->|No| G[π Check config.json]
G --> H{File exists?}
H -->|Yes| I[β
Load Local Config]
H -->|No| J[π§ Use Default Config]
```
---
## π¨ Configuration Method Details
### π Method 1: External Configuration File (π Recommended)
> π‘ **Advantages**: Easy to edit, syntax highlighting, version control friendly
<details>
<summary><b>π§ Detailed Setup Steps</b></summary>
1. **Create Configuration File**
```bash
# Can be placed anywhere
touch my-api-configs.json
```
2. **Edit Configuration Content**
```json
{
"websites": [
{
"name": "my_docs",
"url": "https://docs.example.com",
"description": "π My Documentation Website"
}
]
}
```
3. **Set Environment Variable**
```json
{
"env": {
"WEBSITES_CONFIG_PATH": "./my-api-configs.json"
}
}
```
</details>
### π Method 2: Embedded JSON (Backward Compatible)
<details>
<summary><b>π§ Configuration Example</b></summary>
```json
{
"mcpServers": {
"website-to-markdown": {
"command": "cmd",
"args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
"disabled": false,
"env": {
"WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}"
}
}
}
}
```
</details>
### π Method 3: Local config.json
<details>
<summary><b>π§ Local Configuration</b></summary>
Directly edit `config.json` in the project root directory:
```json
{
"websites": [
{
"name": "local_site",
"url": "https://local.example.com",
"description": "π Local Test Website"
}
]
}
```
</details>
---
## π§ Available Tools
### π General Tools
| Tool Name | Function | Parameters | Example |
|:--------|:-----|:-----|:-----|
| `fetch_website` | Fetch any website | `url`: Website URL | Fetch OpenAPI spec files |
| `list_configured_websites` | List configured websites | None | View all available websites |
### π― Dedicated Tools
Each configured website automatically generates corresponding dedicated tools:
- `fetch_petstore_openapi` - Fetch Petstore OpenAPI 3.0 spec
- `fetch_petstore_swagger` - Fetch Petstore Swagger 2.0 spec
- `fetch_github_api` - Fetch GitHub API spec
- `fetch_tailwind_css` - Fetch Tailwind CSS documentation
---
## π Enhanced Output Format Examples
### π General Website Content with Analytics
```markdown
# Website Title
**Source**: https://example.com
**Website**: example_site - Example Website
**π Reading Time**: 3 minutes
**π’ Word Count**: 650 words
**π Language**: English
**π Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies.
---
[Enhanced cleaned Markdown content with ads removed and main content extracted...]
```
### π OpenAPI 3.x Specification File
```markdown
# π Example API (v2.1.0)
**Source**: https://api.example.com/openapi.json
**OpenAPI Version**: 3.0.3
**Validation Status**: β
Valid
**π Processing Time**: 1.2 seconds
**π’ Endpoints**: 25 endpoints
**π Server Locations**: 3 servers
---
## π API Basic Information
- **API Name**: Example API
- **Version**: 2.1.0
- **OpenAPI Version**: 3.0.3
- **Description**: A powerful example API for modern applications
## π Servers
1. **https://api.example.com**
- π’ Production server
2. **https://staging-api.example.com**
- π§ͺ Testing server
## π οΈ API Endpoints
Total of **25** endpoints:
### π₯ `/users`
- **GET**: Get user list
- **POST**: Create new user
### π `/users/{id}`
- **GET**: Get specific user
- **PUT**: Update user information
- **DELETE**: Delete user
## π§© Components
- **Schemas**: 12 data models
- **Parameters**: 8 reusable parameters
- **Responses**: 15 reusable responses
- **Security Schemes**: 3 security mechanisms
```
---
## π― Usage Examples
### π» Basic Usage
```
Please fetch the content from https://docs.example.com and convert to markdown
```
### π OpenAPI Specification Fetching
```
Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification
```
### π Documentation Website Fetching
```
Please fetch React official documentation content
```
---
## π¨ Troubleshooting
> π **Complete Troubleshooting Guide**: See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for detailed solutions to common issues.
### β Quick Solutions
<details>
<summary><b>π§ Node.js Version Issues</b></summary>
**Error**: `npm WARN EBADENGINE Unsupported engine`
- **Solution**: Update Node.js to v20.18.1 or higher
- **Download**: [Node.js Official Website](https://nodejs.org/)
- **Verify**: `node --version`
</details>
<details>
<summary><b>π Module Not Found Issues</b></summary>
**Error**: `Cannot find module './db.json'`
- **Solution 1**: Clear npm cache: `npm cache clean --force`
- **Solution 2**: Update Node.js version
- **Solution 3**: Use local installation instead of npx
</details>
<details>
<summary><b>βοΈ Configuration Issues</b></summary>
**Q: Configuration changes not taking effect?**
- β
Confirm JSON format is correct
- β
Restart Cursor
- β
Check environment variable names
**Q: JSON format errors?**
- π οΈ Use [JSON Validator](https://jsonlint.com/)
- π οΈ Confirm using double quotes
- π οΈ Check for extra commas
</details>
### π Debug Mode
Detailed logs are output to stderr at startup:
```bash
# View debug messages
npm run dev 2> debug.log
```
---
## π Performance & Optimization
### β‘ Performance Features
- π **Smart Retry**: Intelligent retry with exponential backoff
- πΎ **Rate Limiting**: Built-in rate limiting to prevent overload
- π― **Content Filtering**: Remove irrelevant content for faster processing
- π§Ή **Ad Removal**: Automatic ad and popup removal
- π **Stealth Mode**: Anti-detection browsing capabilities
### π‘οΈ Security Considerations
- π HTTPS websites only (recommended)
- π οΈ Auto filter malicious scripts
- π Limit output content length
- π Stealth browsing to avoid detection
---
## π¦ Dependencies
<div align="center">
| Package | Version | Purpose |
|:-----|:----:|:-----|
| `@modelcontextprotocol/sdk` | ^1.0.0 | MCP Core Framework |
| `@readme/openapi-parser` | ^4.1.0 | Professional OpenAPI Parsing |
| `axios` | ^1.6.0 | HTTP Request Handling |
| `cheerio` | ^1.0.0 | HTML Parsing Engine |
| `turndown` | ^7.1.2 | HTML to Markdown |
| `yaml` | ^2.8.0 | YAML Format Support |
| `zod` | ^3.22.0 | Data Validation Framework |
| `playwright` | ^1.40.0 | Browser automation |
</div>
---
## π Changelog
### π v1.2.0 (Latest)
<div align="center">
**π Major Feature Updates**
</div>
- β¨ **Added** Enhanced content processing with AI-powered cleanup
- β¨ **Added** Smart analytics: word count, reading time, content summary
- β¨ **Added** Language detection and multi-language support
- β¨ **Added** Stealth browser capabilities for anti-detection
- β¨ **Added** Built-in rate limiting and retry mechanisms
- β¨ **Added** Advanced content filtering and ad removal
- π§ **Enhanced** Markdown processing with more HTML element support
- π **Improved** Output format with rich metadata
- π― **Fixed** Various technical issues and dependencies
### π― v1.1.0 (Previous)
<div align="center">
**π Major Feature Updates**
</div>
- β¨ **Added** Full OpenAPI 3.x/Swagger 2.0 support
- β¨ **Added** JSON/YAML format auto-detection
- β¨ **Added** Professional-grade spec validation and reference resolution
- β¨ **Added** Version auto-adaptation mechanism
- β¨ **Added** Structured API documentation summary
- π§ **Pre-configured** Multiple OpenAPI/Swagger examples
- π¦ **Added** NPM package distribution with npx support
- π― **Enhanced** Installation methods for better user experience
### π― v1.0.0 (Stable)
- π **Initial Release**
- π **Basic Functions** Website content fetching
- π **Core Functions** Markdown conversion
- βοΈ **Configuration Support** Multi-website management
---
## π€ Contributing
### π‘ How to Contribute
1. **π΄ Fork** this project
2. **π Create** feature branch (`git checkout -b feature/AmazingFeature`)
3. **π Commit** changes (`git commit -m 'Add some AmazingFeature'`)
4. **π€ Push** to branch (`git push origin feature/AmazingFeature`)
5. **π Open** Pull Request
### π Issue Reporting
Report issues on the [Issues](https://github.com/your-repo/issues) page, please include:
- π **Issue Description**
- π **Reproduction Steps**
- π» **Environment Information**
- πΈ **Screenshots or Logs**
---
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
<div align="center">
### π If this project helps you, please give it a Star!
**π¬ Have questions or suggestions? Feel free to open an Issue!**
---
**Made by Sun** β€οΈ **for the Developer Community**
</div>