Prysm MCP 服务器

🔍 Prysm MCP 服务器

Prysm MCP（模型上下文协议）服务器使 Claude 等 AI 助手能够以高精度和灵活性抓取网络内容。

✨ 特点

🎯多种抓取模式：从集中（速度）、平衡（默认）或深度（彻底）模式中选择
🧠内容分析：分析 URL 以确定最佳抓取方法
📄格式灵活性：将结果格式化为 markdown、HTML 或 JSON
🖼️图像支持：可选择提取甚至下载图像
🔍智能滚动：配置单页应用程序的滚动行为
📱响应式：适应不同的网站布局和结构
💾文件输出：将格式化的结果保存到您首选的目录

🚀 快速入门

安装

# Recommended: Install the LLM-optimized version
npm install -g @pinkpixel/prysm-mcp

# Or install the standard version
npm install -g prysm-mcp

# Or clone and build
git clone https://github.com/pinkpixel-dev/prysm-mcp.git
cd prysm-mcp
npm install
npm run build

集成指南

我们为流行的 MCP 兼容应用程序提供详细的集成指南：

用法

有多种方法可以设置 Prysm MCP 服务器：

使用 mcp.json 配置

根据上述指南在适当的位置创建mcp.json文件。

{
  "mcpServers": {
    "prysm-scraper": {
      "description": "Prysm web scraper with custom output directories",
      "command": "npx",
      "args": [
        "-y",
        "@pinkpixel/prysm-mcp"
      ],
      "env": {
        "PRYSM_OUTPUT_DIR": "${workspaceFolder}/scrape_results",
        "PRYSM_IMAGE_OUTPUT_DIR": "${workspaceFolder}/scrape_results/images"
      }
    }
  }
}

🛠️ 工具

该服务器提供以下工具：

`scrapeFocused`

针对速度进行了优化的快速网页抓取（更少的滚动，仅主要内容）。

Please scrape https://example.com using the focused mode

可用参数：

url （必填）：要抓取的 URL
maxScrolls （可选）：滚动尝试的最大次数（默认值：5）
scrollDelay （可选）：滚动之间的延迟（毫秒）（默认值：1000）
scrapeImages （可选）：是否在结果中包含图像
downloadImages （可选）：是否将图片下载到本地
maxImages （可选）：要提取的最大图像数
output （可选）：下载图像的输出目录

`scrapeBalanced`

平衡的网络抓取方法，覆盖范围广，速度合理。

Please scrape https://example.com using the balanced mode

可用参数：

与scrapeFocused相同，但默认值不同
maxScrolls默认值：10
scrollDelay默认值：2000
添加timeout参数以限制总抓取时间（默认值：30000ms）

`scrapeDeep`

最大程度地提取网页抓取（速度较慢但彻底）。

Please scrape https://example.com using the deep mode with maximum scrolls

可用参数：

与scrapeFocused相同，但默认值不同
maxScrolls默认值：20
scrollDelay默认值：3000
maxImages默认值：100

`formatResult`

将抓取的数据格式化为不同的结构化格式（markdown、HTML、JSON）。

Format the scraped data as markdown

可用参数：

data （必需）：要格式化的抓取数据
format （必需）：输出格式 - “markdown”，“html”或“json”
includeImages （可选）：是否在输出中包含图像（默认值：true）
output （可选）：保存格式化结果的文件路径

您还可以通过指定输出路径将格式化的结果保存到文件中：

Format the scraped data as markdown and save it to "my-results/output.md"

⚙️ 配置

输出目录

默认情况下，保存格式化结果时，文件将保存到~/prysm-mcp/output/ 。您可以通过两种方式自定义：

环境变量：将环境变量设置为您的首选目录：

# Linux/macOS
export PRYSM_OUTPUT_DIR="/path/to/custom/directory"
export PRYSM_IMAGE_OUTPUT_DIR="/path/to/custom/image/directory"

# Windows (Command Prompt)
set PRYSM_OUTPUT_DIR=C:\path\to\custom\directory
set PRYSM_IMAGE_OUTPUT_DIR=C:\path\to\custom\image\directory

# Windows (PowerShell)
$env:PRYSM_OUTPUT_DIR="C:\path\to\custom\directory"
$env:PRYSM_IMAGE_OUTPUT_DIR="C:\path\to\custom\image\directory"

工具参数：调用工具时直接指定输出路径：

# For general results
Format the scraped data as markdown and save it to "/absolute/path/to/file.md"

# For image downloads when scraping
Please scrape https://example.com and download images to "/absolute/path/to/images"

MCP 配置：在您的 MCP 配置文件（例如.cursor/mcp.json ）中，您可以设置以下环境变量：

{
  "mcpServers": {
    "prysm-scraper": {
      "command": "npx",
      "args": ["-y", "@pinkpixel/prysm-mcp"],
      "env": {
        "PRYSM_OUTPUT_DIR": "${workspaceFolder}/scrape_results",
        "PRYSM_IMAGE_OUTPUT_DIR": "${workspaceFolder}/scrape_results/images"
      }
    }
  }
}

如果未指定PRYSM_IMAGE_OUTPUT_DIR ，则默认为PRYSM_OUTPUT_DIR内的名为images的子文件夹。

如果您只提供相对路径或文件名，它将相对于配置的输出目录保存。

路径处理规则

formatResult工具通过以下方式处理路径：

绝对路径：完全按照提供的路径使用（ /home/user/file.md ）
相对路径：相对于配置的输出目录保存（ subfolder/file.md ）
仅文件名：保存在配置的输出目录中（ output.md ）
目录路径：如果路径指向目录，则会根据内容和时间戳自动生成文件名

🏗️ 开发

# Install dependencies
npm install

# Build the project
npm run build

# Run the server locally
node bin/prysm-mcp

# Debug MCP communication
DEBUG=mcp:* node bin/prysm-mcp

# Set custom output directories
PRYSM_OUTPUT_DIR=./my-output PRYSM_IMAGE_OUTPUT_DIR=./my-output/images node bin/prysm-mcp

通过 npx 运行

您可以直接使用 npx 运行服务器，无需安装：

# Run with default settings
npx @pinkpixel/prysm-mcp

# Run with custom output directories
PRYSM_OUTPUT_DIR=./my-output PRYSM_IMAGE_OUTPUT_DIR=./my-output/images npx @pinkpixel/prysm-mcp

📋 许可证

麻省理工学院

🙏 致谢

由Pink Pixel开发

由模型上下文协议和Puppeteer提供支持

Install Server

HTTP connection URL

security – no known vulnerabilities

license - permissive license

quality - confirmed to work

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Tools

模型上下文协议服务器使人工智能助手能够以高精度和灵活性抓取网络内容，支持多种抓取模式和内容格式选项。

Related Resources

Reddit Discussion about this server

Related MCP Servers

ScrapeGraph MCP Serverofficial
ScrapeGraphAI
A
security
A
license
A
quality
A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
Last updated -
3
33
Python
MIT License
Firecrawl MCP Server
mcma123
-
security
A
license
-
quality
A Model Context Protocol server that enables AI assistants to perform advanced web scraping, crawling, searching, and data extraction through the Firecrawl API.
Last updated -
20,674
MIT License
WebSearch-MCP
mnhlt
A
security
F
license
A
quality
A Model Context Protocol server that enables AI assistants to perform real-time web searches, retrieving up-to-date information from the internet via a Crawler API.
Last updated -
1
616
14
JavaScript
Crawl4AI MCP Server
BjornMelin
-
security
F
license
-
quality
High-performance server enabling AI assistants to access web scraping, crawling, and deep research capabilities through Model Context Protocol.
Last updated -
7
TypeScript

View all related MCP servers

Prysm MCP Server