SFCC Development MCP Server

sfcc-dev-mcp
.github
skills
salesforce-developer-site-scraper

SKILL.md•3.98 KiB

--- name: salesforce-developer-site-scraper description: 'Scrape Salesforce Developer documentation into clean Markdown using headless Chromium, Readability, and the docs content API fallback. Use when content is async or blocked by OneTrust cookie banners.' license: Forward Proprietary compatibility: VS Code 1.x+, Node.js 18+ --- # Salesforce Developer Site Scraper Use this skill to capture Salesforce Developer documentation pages as clean Markdown even when content loads asynchronously or is blocked by OneTrust cookie banners. ## When to Use This Skill - A Salesforce Developer doc page renders key content after async requests. - A OneTrust cookie banner hides content until consent is accepted. - You need a readable Markdown snapshot for Apex, LWC, or platform docs. - NOT for: high-volume crawling or scraping behind access restrictions. ## Prerequisites - Node.js 18+ - npm dependencies for the script (see below) ## How to Use ### 1) Install script dependencies (if not done already) ```bash npm install playwright @mozilla/readability jsdom turndown ``` ### 2) Run the script ```bash node skills/salesforce-developer-site-scraper/scripts/scrape-to-markdown.js \ --url "https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_intro.htm" \ --out "./artifacts/online-research/apex_intro.md" \ --consent-selector "#onetrust-accept-btn-handler" \ --wait 2000 ``` ## Script Options | Option | Required | Description | | --- | --- | --- | | `--url` | Yes | Target URL to fetch and extract. | | `--out` | Yes | Output Markdown file path. | | `--consent-selector` | No | CSS or text selector for a cookie banner accept button. | | `--wait` | No | Milliseconds to wait after navigation or consent click. | | `--content-selector` | No | Extract only this element instead of Readability parsing. | | `--remove-selectors` | No | Comma-separated selectors to remove before extraction. | | `--cookie` | No | Consent/session cookie, e.g. `name=value;domain=example.com;path=/`. | | `--storage-state` | No | Playwright storage state JSON file to reuse consent/session. | | `--timeout` | No | Navigation timeout in ms (default 45000). | | `--no-default-removals` | No | Disable default cookie/consent element removals. | ## Compliance Notes - Respect robots.txt and site terms before scraping. - Use consent cookies or storage state only when you have permission. - Avoid collecting personal data unless you have a legal basis. ## Examples ### Example: Reuse a consent state ```bash node skills/salesforce-developer-site-scraper/scripts/scrape-to-markdown.js \ --url "https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_intro.htm" \ --out "./artifacts/online-research/apex_intro.md" \ --storage-state "./artifacts/online-research/consent-state.json" ``` ### Example: Extract a specific content container ```bash node skills/salesforce-developer-site-scraper/scripts/scrape-to-markdown.js \ --url "https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_intro.htm" \ --out "./artifacts/online-research/apex_intro.md" \ --content-selector "main article" ``` ## Troubleshooting ### Issue: Output is empty or too short **Solution**: Add `--wait` or provide a `--content-selector` for the primary content node. ### Issue: Cookie banner blocks content **Solution**: Provide `--consent-selector` (OneTrust) or reuse a `--storage-state` with consent already saved. ## References - Playwright Docs: https://playwright.dev/docs/intro - Playwright Cookies: https://playwright.dev/docs/api/class-browsercontext#browser-context-add-cookies - Playwright Storage State: https://playwright.dev/docs/auth#reuse-authentication-state - Readability (Mozilla): https://github.com/mozilla/readability - DOMParser (MDN): https://developer.mozilla.org/en-US/docs/Web/API/DOMParser - Robots Exclusion Protocol (RFC 9309): https://www.rfc-editor.org/rfc/rfc9309 - GDPR: https://eur-lex.europa.eu/eli/reg/2016/679/oj - ePrivacy Directive: https://eur-lex.europa.eu/eli/dir/2002/58/oj

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/taurgis/sfcc-dev-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SKILL.md•3.98 KiB