silkworm-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@silkworm-mcpfetch https://news.ycombinator.com and extract all story titles using CSS selector .titleline > a"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
silkworm-mcp
This is a full-featured MCP server for building scrapers with:
silkworm-rs: async crawling, fetching, follow links, and spider execution
scraper-rs: fast Rust-backed HTML parsing with CSS and XPath selectors
It is designed for LLM-assisted scraper development, so the server exposes both low-level page inspection tools and higher-level workflow helpers for validating selector plans and generating starter spider code.
An example: https://github.com/BitingSnakes/silkworm-example
Features
Fetch pages through silkworm's regular HTTP client or CDP renderer.
Query selectors directly against a CDP-rendered DOM snapshot.
Analyze inline and linked CSS with
tinycss2, then optionally map selectors back onto HTML.Extract structured records from live rendered pages before committing to a full crawl.
Cache HTML in a local document store and reuse it via
document_handle.Bound the document cache with max-document, max-bytes, and idle-TTL controls.
Inspect pages with summaries, parsed DOM trees, prettified HTML, CSS/XPath queries, selector comparisons, and link extraction.
Run ad hoc crawls from a structured
CrawlBlueprint.Generate reusable silkworm spider templates from the same blueprint and statically validate them, including pattern-specific variants for list-only, list+detail, sitemap/XML, and CDP-heavy crawls.
Expose MCP diagnostics plus HTTP
/healthzand/readyzroutes for production monitoring.Publish MCP resources and prompts so clients can discover workflows, Silkworm idioms, and blueprint schemas.
Tools
store_html_documentlist_documentsdelete_documentclear_documentsserver_statusinspect_documentparse_html_documentparse_html_fragmentprettify_documentquery_selectoranalyze_css_selectorsfind_selectors_by_textcompare_selectorsextract_linkssilkworm_fetchsilkworm_fetch_cdpquery_selector_cdpextract_structured_data_cdprun_crawl_blueprintgenerate_spider_templatevalidate_spider_code
Run
Install dependencies:
uv syncRun over stdio for a desktop MCP client:
uv run python mcp_server.py --transport stdioRun over HTTP:
uv run python mcp_server.py --transport http --host 127.0.0.1 --port 8000HTTP deployments also expose:
GET /healthz: process livenessGET /readyz: readiness, optionally including a CDP browser probe
The project also exposes a console entrypoint:
uv run silkworm-mcp --transport stdioDocker
Build the image:
docker build -t silkworm-mcp .Run the container over HTTP on port 8000:
docker run --rm -it -p 8000:8000 silkworm-mcpThe container entrypoint starts two processes by default:
the MCP server over HTTP on
0.0.0.0:8000a bundled Lightpanda browser on
127.0.0.1:9222for CDP-backed tools such assilkworm_fetch_cdp,query_selector_cdp, andextract_structured_data_cdp
Useful container environment variables:
MCP_TRANSPORT(default:http)MCP_HOST(default:0.0.0.0)MCP_PORT(default:8000)MCP_PATHLIGHTPANDA_ENABLED(default:1)LIGHTPANDA_HOST(default:127.0.0.1)LIGHTPANDA_PORT(default:9222)LIGHTPANDA_ADVERTISE_HOST(default: unset, falls back toLIGHTPANDA_HOST)LIGHTPANDA_LOG_FORMAT(default:pretty)LIGHTPANDA_LOG_LEVEL(default:info)
When Lightpanda binds to 0.0.0.0 inside a container, set LIGHTPANDA_ADVERTISE_HOST to a reachable hostname such as the container DNS name. Otherwise /json/version can advertise ws://0.0.0.0:9222/, which remote CDP clients cannot use.
Example with custom document-cache limits:
docker run --rm -it \
-p 8000:8000 \
-e SILKWORM_MCP_DOCUMENT_MAX_COUNT=256 \
-e SILKWORM_MCP_DOCUMENT_MAX_TOTAL_BYTES=64000000 \
-e SILKWORM_MCP_DOCUMENT_TTL_SECONDS=7200 \
silkworm-mcpFor local development, compose.yml provides the same setup with health checks and restart policy:
docker compose up --buildThen verify the container is ready:
curl http://127.0.0.1:8000/readyzKey runtime environment variables:
SILKWORM_MCP_DOCUMENT_MAX_COUNTSILKWORM_MCP_DOCUMENT_MAX_TOTAL_BYTESSILKWORM_MCP_DOCUMENT_TTL_SECONDSSILKWORM_MCP_DOCUMENT_STORE_PATHSILKWORM_MCP_LOG_LEVELSILKWORM_MCP_READINESS_REQUIRE_CDPSILKWORM_MCP_READINESS_CDP_WS_ENDPOINT
Example Workflow
Call
silkworm_fetchfor the target page.Use the returned
document_handlewithinspect_document.Use
parse_html_documentorparse_html_fragmentwhen you need exact parser structure, node types, or parser errors.Use
find_selectors_by_textto derive candidates from visible text, then iterate onquery_selector,compare_selectors, andanalyze_css_selectorswhen stylesheet structure or hidden elements matter.For JS-heavy pages, use
query_selector_cdporextract_structured_data_cdpagainst the rendered DOM.Use
extract_linksto verify pagination or detail pages.Feed the stable plan into
run_crawl_blueprint.Convert the same blueprint into code with
generate_spider_template, then check it withvalidate_spider_code.
Useful built-in MCP references:
silkworm://reference/overviewsilkworm://reference/silkworm-cheatsheetsilkworm://reference/silkworm-playbooksilkworm://reference/template-variantssilkworm://reference/scraper-rs-cheatsheetsilkworm://reference/crawl-blueprint-schema
Use transport: "cdp" when pages require JavaScript rendering. run_crawl_blueprint will connect to the configured CDP endpoint, and generate_spider_template will emit a starter spider that runs through CDPClient instead of the default HTTP client.
Both run_crawl_blueprint and generate_spider_template accept a variant override. When omitted, they infer a crawl style from the blueprint:
list_only: listing pages emit items directly, with optional paginationlist_detail: listing pages schedule detail requests and a separateparse_detailsitemap_xml: sitemap/XML entrypoints are fetched withmeta={"allow_non_html": True}and parsed before scheduling page requestscdp_heavy: rendered-page crawls keep the CDP execution path and a general-purpose parse/follow flow
run_crawl_blueprint returns the resolved execution_variant, and generate_spider_template returns the resolved template_variant, so clients can see which crawl shape was actually used.
Testing
Run the automated test suite with:
just testAcknowledgement
This project builds on the excellent work behind FastMCP, silkworm-rs, and scraper-rs. Together they provide the MCP server framework, crawling runtime, and HTML parsing foundations that make this project possible.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/BitingSnakes/silkworm-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server