This server enables AI assistants to scrape, index, search, and export Salesforce documentation — including content behind Shadow DOMs and bot protection — for offline and RAG purposes.
Scrape a single page (
scrape_single_page): Extract clean Markdown from any Salesforce documentation URL, with automatic handling of headless browsing, Shadow DOM piercing, iframes, and dynamic SPA content.Bulk spider and index a guide (
mass_extract_guide): Provide a Table of Contents or root URL and the server discovers all linked pages, scrapes them concurrently (up to 100 pages), and stores chunked content in a local SQLite database.Search locally indexed docs (
search_local_docs): Run natural language or keyword queries against the local SQLite database to instantly retrieve relevant pre-scraped documentation chunks without re-running a browser.Read a full local document (
read_local_document): Instantly retrieve the full Markdown content of any previously indexed page directly from the local database, bypassing re-scraping.Export local documents (
export_local_documents): Compile an entire guide or multiple guides from the local database into a single concatenated Markdown file for offline reading or sharing.
Additional capabilities include bypassing bot protection (e.g., Akamai Bot Manager), maintaining a persistent offline knowledge base syncable via Git across teams, and integrating with AI assistants like Cursor and Claude Desktop via the Model Context Protocol (MCP).
Provides tools for scraping, searching, and extracting content from modern and legacy Salesforce documentation, including the ability to handle deeply nested Shadow DOMs and Lightning Web Components (LWC).
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Unified Salesforce Documentation MCP Serversearch local docs for LWC component lifecycle hooks"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Unified Salesforce Documentation MCP Server
A powerful Model Context Protocol (MCP) server that empowers LLMs to scrape, digest, and search through modern and legacy Salesforce documentation. It elegantly handles deeply nested Shadow DOMs, typical of Lightning Web Components (LWC), and legacy iframe-based documentation structures.
Features
Deep Shadow DOM Piercing: Bypasses 400KB+ of SPA boilerplate on
help.salesforce.comanddeveloper.salesforce.comto extract only the pure article Markdown.Bot-Protection Bypass: Includes a Stealth Architecture that transparently evades Akamai Bot Manager and other WAFs while perfectly executing Lightning Web Components to hydrate SPAs before extraction.
Hierarchical Spidering: Automatically queues and scrapes all related pages linked from a central guide using
mass_extract_guide.Offline RAG Capabilities: Chunks and indexes scraped Markdown into a local SQLite database (
docs.db) allowing for instantaneous local search usingsearch_local_docs.
Available Tools
scrape_single_page: Provide a Salesforce documentation URL. The server will use a headless browser (Puppeteer) to load the page, wait for dynamic content, pierce all shadow DOMs, and return clean Markdown.mass_extract_guide: Provide a "Table of Contents" or central guide URL. The server will extract the parent page, find all hierarchical child links, scrape them concurrently, chunk their content, and save them to a local SQLite database for offline querying.search_local_docs: Provide a natural language query (e.g.,LWC lifecycle hooks). The server queries the SQLite database using fuzzy SQL search to instantly return the best matching pre-scraped chunks of documentation.read_local_document: Rapidly extracts the full Markdown content of a documentation page that has already been indexed locally, instantly returning the content without needing to re-run headless Chromium to bypass CDNs.export_local_documents: Safely compile an entire guide (or multiple guides) stored in the offline SQLite database into a massive concatenated Markdown file exported directly to your local file system, without saturating LLM context windows or writing complex CLI scripts.
Quick Start (Using with AI Assistants)
MCP servers act as a bridge between an LLM and local tools. To actually use this server, you need to plug it into an AI coding assistant like Cursor or Claude Desktop.
The absolute easiest way to do this is to use npx, which will automatically download and run the latest version of the server from NPM.
1. Cursor (Recommended)
Open Cursor Settings -> Features -> MCP
Click + Add new MCP server
Configure the settings:
Type:
commandName:
unified-sf-docsCommand:
npx -y unified-sf-docs-mcp
Click Save. Cursor will instantly download the package and surface the 5 new tools to the Cursor Agent.
2. Claude Desktop
Open the Claude Desktop configuration file:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.json
Add the following entry to your
mcpServersobject:
{
"mcpServers": {
"unified-sf-docs": {
"command": "npx",
"args": [
"-y",
"unified-sf-docs-mcp"
],
"env": {
"SF_DOCS_DB_DIR": "/absolute/path/to/your/private/github/repository"
}
}
}
}Restart Claude Desktop. The tools will now be available when talking to Claude!
3. Git-Backed Persistence (Team Sharing & Vector Syncing)
By default, the SQLite database is securely stored entirely offline at ~/.unified-sf-docs-mcp/salesforce-docs.db.
If you want to sync your scraped AI Knowledge Base across multiple computers, or share a pre-scraped docs.db vector database with a private engineering team:
Create a Private Git Repository (e.g. on GitHub). Note: Keep it private to avoid distributing Salesforce's copyrighted material publicly.
Clone it to your local machine (e.g.
/Users/todd/my-private-sf-kb).Add the
SF_DOCS_DB_DIRenvironment variable to your Cursor or Claude Desktop MCP settings, pointing to your cloned folder.Run
mass_extract_guidevia your AI to scrape the documentation. The 100MB+salesforce-docs.dbwill be securely created inside your Git repository, ready to be committed and pushed!
Local Development & Testing
If you want to modify the source code yourself, you can point your AI assistant to a local installation instead of using npx:
Clone the Repository:
git clone https://github.com/tmtrevisan/unified-sf-docs-mcp.git cd unified-sf-docs-mcpInstall & Build:
npm install && npm run build(Note: The server runs from the compiled
You can use the provided test scripts to verify the core functionality or the scraper against different Salesforce URL layouts:
# Test the database, chunking, and search functionality
npx tsx tests/test-core.js
# Test the robust Shadow DOM scraper against 4 different URL permutations
npx tsx tests/test-all.jsUpdate your MCP config:
Type:
commandCommand:
node /ABSOLUTE/PATH/TO/unified-sf-docs-mcp/dist/index.js
Resources
Looking for Admin?
Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.