Markdown MCP Server
A Model Context Protocol (MCP) server that extracts clean markdown content from web pages using Playwright. This server provides a get_page_markdown
tool that can extract the main content from any URL while filtering out navigation, headers, footers, and other non-content elements.
Features
๐ฏ Smart Content Extraction: Automatically identifies and extracts main content from web pages
๐งน Clean Output: Filters out navigation, headers, footers, sidebars, and advertisements
๐จ Rich Markdown: Preserves formatting including headings, bold, italic, code blocks, lists, and tables
๐ผ๏ธ Image Support: Optionally includes image references in markdown
๐ Link Support: Optionally includes hyperlinks in markdown
โก Fast & Reliable: Uses Playwright for robust web scraping
๐ Dynamic Content: Handles JavaScript-heavy sites and dynamic content loading
๐ก๏ธ Error Handling: Robust error handling with fallback extraction methods
Installation
Clone or download this repository:
git clone <repository-url> cd markdown-mcpInstall dependencies:
npm installInstall Playwright browsers:
npx playwright install chromiumMake the script executable (optional):
chmod +x markdown-mcp.js
Usage
As an MCP Server
Start the server:
The server provides one tool: get_page_markdown
Tool Parameters
url
(required): The URL to extract markdown fromincludeImages
(optional, default: true): Whether to include image references in markdownincludeLinks
(optional, default: true): Whether to include hyperlinks in markdownwaitForSelector
(optional): CSS selector to wait for before extracting content (useful for dynamic content)timeout
(optional, default: 30000): Navigation timeout in milliseconds
Example Usage
Advanced Usage Examples
Extract content from a specific section:
Extract content with custom timeout:
File Structure
This project includes two MCP server files optimized for different clients:
markdown-mcp.js
- Optimized for Claude Desktopmarkdown-mcp-gemini.js
- Optimized for Gemini CLI
Both files provide the same get_page_markdown
tool but are configured differently for optimal performance with each client.
Adding to AI Clients
This MCP server can be used with multiple AI clients that support the Model Context Protocol. Below are instructions for the most popular clients.
Claude Desktop Integration
To use this MCP server with Claude Desktop, you need to add it to your Claude Desktop configuration file.
Step 1: Locate Claude Desktop Configuration
macOS:
Configuration file:
~/Library/Application Support/Claude/claude_desktop_config.json
Windows:
Configuration file:
%APPDATA%\Claude\claude_desktop_config.json
Linux:
Configuration file:
~/.config/claude/claude_desktop_config.json
Step 2: Edit Configuration File
Open the configuration file in a text editor
Add the markdown-mcp server to the
mcpServers
sectionUpdate the path to point to your
markdown-mcp.js
file
Step 3: Configuration Examples
macOS Configuration
Windows Configuration
Linux Configuration
Step 4: Restart Claude Desktop
After updating the configuration file, restart Claude Desktop for the changes to take effect.
Step 5: Verify Installation
Open Claude Desktop
Start a new conversation
Try asking Claude to extract content from a webpage using the markdown-mcp tool
Example: "Use markdown-mcp to extract content from https://example.com"
Troubleshooting
If the MCP server doesn't work:
Check the file path - Make sure the path to
markdown-mcp.js
is correct and the file existsVerify Node.js - Ensure Node.js is installed and accessible from the command line
Check permissions - Make sure the script has execute permissions
Test manually - Try running
node markdown-mcp.js
in the terminal to see if there are any errorsCheck Claude Desktop logs - Look for error messages in Claude Desktop's developer console
Common Issues:
Path not found: Double-check the file path in the configuration
Node.js not found: Make sure Node.js is installed and in your PATH
Permission denied: Run
chmod +x markdown-mcp.js
to make the script executableDependencies missing: Run
npm install
in the markdown-mcp directory
Gemini CLI Integration
To use this MCP server with Gemini CLI, follow these steps:
Step 1: Install Gemini CLI
If you haven't already installed Gemini CLI:
Verify the installation:
Step 2: Add MCP Server to Gemini CLI
Add your markdown-mcp server to Gemini CLI:
Important: Replace /Users/yourusername/path/to/markdown-mcp/markdown-mcp-gemini.js
with the actual path to your markdown-mcp-gemini.js
file.
Step 3: Verify Integration
List all configured MCP servers to verify the integration:
You should see markdown-mcp
listed among the servers.
Step 4: Test the Integration
Test the markdown-mcp server with Gemini CLI:
Or you can use the tool directly:
Step 5: Complete Example - Extract and Save Markdown
Here's a complete example that extracts markdown content and saves it to a file:
This command will:
Use the
get_page_markdown
tool to extract clean markdown content from the Confluent blog postSave the extracted markdown content to a file named
result.md
in your current directoryProvide you with a clean, readable markdown version of the webpage content
Additional Examples:
Gemini CLI Troubleshooting
If the MCP server doesn't work with Gemini CLI:
Check the file path - Ensure the path to
markdown-mcp-gemini.js
is correct and absoluteVerify Node.js - Make sure Node.js is accessible from the command line
Check permissions - Ensure the script has execute permissions (
chmod +x markdown-mcp-gemini.js
)Test the server manually - Run
node markdown-mcp-gemini.js
to check for errorsCheck Gemini CLI logs - Look for error messages in the Gemini CLI output
Common Gemini CLI Issues:
Path not found: Use absolute paths when adding the MCP server
Permission denied: Run
chmod +x markdown-mcp-gemini.js
to make the script executableNode.js not found: Ensure Node.js is installed and in your PATH
Server not responding: Check if the server starts correctly with
node markdown-mcp-gemini.js
Using with Multiple AI Clients
You can use the same markdown-mcp server with multiple AI clients simultaneously. The MCP server is designed to handle multiple concurrent requests efficiently.
Benefits of Multi-Client Setup
Flexibility: Use the same tool with different AI models
Efficiency: Share the same server instance across clients
Consistency: Get the same extraction quality regardless of the AI client
Resource optimization: No need to run multiple server instances
Setup for Multiple Clients
Set up Claude Desktop using
markdown-mcp.js
(as described above)Set up Gemini CLI using
markdown-mcp-gemini.js
(as described above)Both clients can use their respective server files - optimized for each client
Usage Examples
With Claude Desktop:
With Gemini CLI:
Performance Considerations
The server handles multiple concurrent requests efficiently
Each request uses a fresh browser context for security
Memory usage scales with the number of concurrent requests
Typical response time: 5-15 seconds per request
Testing
The server has been tested and verified to work correctly with various websites including:
โ Documentation sites (Confluent, GitHub, etc.)
โ News articles and blog posts
โ Technical documentation with code examples
โ E-commerce pages and product descriptions
โ JavaScript-heavy sites with dynamic content
Tested Features
โ Extracts headings, paragraphs, and text content
โ Preserves bold and italic formatting
โ Handles code blocks and inline code
โ Processes lists (ordered and unordered)
โ Extracts tables with proper formatting
โ Filters out navigation and footer content
โ Handles images and links (when enabled)
โ Responds to MCP protocol requests
โ Works with dynamic content and JavaScript-heavy sites
Manual Testing
You can test the server manually by running:
Supported Websites
This MCP server works well with:
Documentation sites: Confluent, GitHub, GitLab, etc.
News and blogs: Most major news sites and blogs
Technical content: Stack Overflow, Medium, Dev.to
E-commerce: Product pages and descriptions
Academic content: Research papers and articles
Social media: Twitter threads, LinkedIn articles
Performance
Typical extraction time: 5-15 seconds depending on page complexity
Memory usage: ~50-100MB per extraction
Supported content size: Up to several MB of text content
Concurrent requests: Handles multiple requests efficiently
Requirements
Node.js: Version 18 or higher
Playwright: Chromium browser (installed automatically)
Memory: At least 512MB available RAM
Disk space: ~200MB for Playwright browser
Security Considerations
The server runs in headless mode for security
No cookies or persistent data is stored
Each request uses a fresh browser context
Network requests are limited by timeout settings
No sensitive data is logged or stored
Contributing
Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request
Support
If you encounter issues:
Check the troubleshooting section above
Verify all requirements are met
Test with a simple URL first
Check Claude Desktop logs for error messages
Open an issue with detailed error information
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
Extracts clean markdown content from web pages using Playwright, automatically filtering out navigation, headers, and ads while preserving formatting. Handles JavaScript-heavy sites and dynamic content, making web content easily readable and processable.