Converts extracted webpage content into well-formatted Markdown with special handling for code blocks, tables, and structured content.
Uses Mozilla's Readability library to extract main content from webpages after AI-driven interactions remove blocking elements.
Uses OpenAI's vision-capable models (or compatible alternatives) to analyze webpage screenshots and drive automated interactions for handling cookie banners, CAPTCHAs, and other blocking elements during web scraping.
Provides web scraping capabilities using Puppeteer to navigate pages, take screenshots, and extract content, with AI-driven interaction to bypass interactive elements and convert webpages to Markdown format.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Puppeteer Vision MCP Serverscrape the latest tech news from hackernews.com"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Puppeteer vision MCP Server
This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features AI-driven interaction capabilities to handle cookies, captchas, and other interactive elements automatically.
Now easily runnable via
Features
Scrapes webpages using Puppeteer with stealth mode
Uses AI-powered interaction to automatically handle:
Cookie consent banners
CAPTCHAs
Newsletter or subscription prompts
Paywalls and login walls
Age verification prompts
Interstitial ads
Any other interactive elements blocking content
Extracts main content with Mozilla's Readability
Converts HTML to well-formatted Markdown
Special handling for code blocks, tables, and other structured content
Accessible via the Model Context Protocol
Option to view browser interaction in real-time by disabling headless mode
Easily consumable as an
npxpackage.
Quick Start with NPX
The recommended way to use this server is via npx, which ensures you're running the latest version without needing to clone or manually install.
Prerequisites: Ensure you have Node.js and npm installed.
Environment Setup: The server requires an
OPENAI_API_KEY. You can provide this and other optional configurations in two ways:.envCreate a.envfile in the directory where you will run thenpxcommand.Shell Environment Variables: Export the variables in your terminal session.
Example
# Required OPENAI_API_KEY=your_api_key_here # Optional (defaults shown) # VISION_MODEL=gpt-4.1 # API_BASE_URL=https://api.openai.com/v1 # Uncomment to override # TRANSPORT_TYPE=stdio # Options: stdio, sse, http # USE_SSE=true # Deprecated: use TRANSPORT_TYPE=sse instead # PORT=3001 # Only used in sse/http modes # DISABLE_HEADLESS=true # Uncomment to see the browser in actionRun the Server: Open your terminal and run:
npx -y puppeteer-vision-mcp-serverThe
-yflag automatically confirms any prompts fromnpx.This command will download (if not already cached) and execute the server.
By default, it starts in
stdiomode. SetTRANSPORT_TYPE=sseorTRANSPORT_TYPE=httpfor HTTP server modes.
Using as an MCP Tool with NPX
This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator. Here's an example configuration snippet:
When configured this way, the MCP orchestrator will manage the lifecycle of the puppeteer-vision-mcp-server process.
Environment Configuration Details
Regardless of how you run the server (NPX or local development), it uses the following environment variables:
OPENAI_API_KEY: (Required) Your API key for accessing the vision model.VISION_MODEL: (Optional) The model to use for vision analysis.Default:
gpt-4.1Can be any model with vision capabilities.
API_BASE_URL: (Optional) Custom API endpoint URL.Use this to connect to alternative OpenAI-compatible providers (e.g., Together.ai, Groq, Anthropic, local deployments).
TRANSPORT_TYPE: (Optional) The transport protocol to use.Options:
stdio(default),sse,httpstdio: Direct process communication (recommended for most use cases)sse: Server-Sent Events over HTTP (legacy mode)http: Streamable HTTP transport with session management
USE_SSE: (Optional, deprecated) Set totrueto enable SSE mode over HTTP.Deprecated: Use
TRANSPORT_TYPE=sseinstead.
PORT: (Optional) The port for the HTTP server in SSE or HTTP mode.Default:
3001.
DISABLE_HEADLESS: (Optional) Set totrueto run the browser in visible mode.Default:
false(browser runs in headless mode).
Communication Modes
The server supports three communication modes:
stdio (Default): Communicates via standard input/output.
Perfect for direct integration with LLM tools that manage processes.
Ideal for command-line usage and scripting.
No HTTP server is started. This is the default mode.
SSE mode: Communicates via Server-Sent Events over HTTP.
Enable by setting
TRANSPORT_TYPE=ssein your environment.Starts an HTTP server on the specified
PORT(default: 3001).Use when you need to connect to the tool over a network.
Connect to:
http://localhost:3001/sse
HTTP mode: Communicates via Streamable HTTP transport with session management.
Enable by setting
TRANSPORT_TYPE=httpin your environment.Starts an HTTP server on the specified
PORT(default: 3001).Supports full session management and resumable connections.
Connect to:
http://localhost:3001/mcp
Tool Usage (MCP Invocation)
The server provides a scrape-webpage tool.
Tool Parameters:
url(string, required): The URL of the webpage to scrape.autoInteract(boolean, optional, default: true): Whether to automatically handle interactive elements.maxInteractionAttempts(number, optional, default: 3): Maximum number of AI interaction attempts.waitForNetworkIdle(boolean, optional, default: true): Whether to wait for network to be idle before processing.
Response Format:
The tool returns its result in a structured format:
content: An array containing a single text object with the raw markdown of the scraped webpage.metadata: Contains additional information:message: Status message.success: Boolean indicating success.contentSize: Size of the content in characters (on success).
Example Success Response:
Example Error Response:
How It Works
AI-Driven Interaction
The system uses vision-capable AI models (configurable via VISION_MODEL and API_BASE_URL) to analyze screenshots of web pages and decide on actions like clicking, typing, or scrolling to bypass overlays and consent forms. This process repeats up to maxInteractionAttempts.
Content Extraction
After interactions, Mozilla's Readability extracts the main content, which is then sanitized and converted to Markdown using Turndown with custom rules for code blocks and tables.
Installation & Development (for Modifying the Code)
If you wish to contribute, modify the server, or run a local development version:
Clone the Repository:
git clone https://github.com/djannot/puppeteer-vision-mcp.git cd puppeteer-vision-mcpInstall Dependencies:
npm installBuild the Project:
npm run buildSet Up Environment: Create a
.envfile in the project's root directory with yourOPENAI_API_KEYand any other desired configurations (see "Environment Configuration Details" above).Run for Development:
npm start # Starts the server using the local buildOr, for automatic rebuilding on changes:
npm run dev
Customization (for Developers)
You can modify the behavior of the scraper by editing:
src/ai/vision-analyzer.ts(analyzePageWithAIfunction): Customize the AI prompt.src/ai/page-interactions.ts(executeActionfunction): Add new action types.src/scrapers/webpage-scraper.ts(visitWebPagefunction): Change Puppeteer options.src/utils/markdown-formatters.ts: Adjust Turndown rules for Markdown conversion.
Dependencies
Key dependencies include:
@modelcontextprotocol/sdkpuppeteer,puppeteer-extra@mozilla/readability,jsdomturndown,sanitize-htmlopenai(or compatible API for vision models)express(for SSE mode)zod