Scrapes webpages and converts them to clean Markdown format, suitable for creating READMEs, documentation, or workflows from web content.
Under Construction
lilFetch: Webpage to README Scraper MCP Server
lilFetch is an MCP (Model Context Protocol) server that scrapes webpages and converts them to clean Markdown, ideal for READMEs, docs, or workflows. It uses crawl4ai under the hood for robust web scraping.
Features
Scrape multiple URLs to structured Markdown.
Handles dynamic content with browser automation (via Playwright).
Outputs timestamped filenames with domain and description.
Easy integration with VS Code via MCP.
Quick Start (For JS/Node Developers)
No Python knowledge required! Just clone, install via npm, and configure in VS Code. The recommended global install makes it usable in any workspace without path management.
Option 1: Global Install (Recommended - Effortless, Cross-Workspace Use)
Install once globally, then npx lilfetch works anywhere. The Python backend (.venv/deps) stays in the cloned repo.
Clone the Repo (do this once; keep the folder for the backend):
git clone https://github.com/yourusername/webpage-to-readme-scraper.git lilfetch-install cd lilfetch-installInstall Node Dependencies (sets up Python backend automatically):
npm installRuns
postinstallto create.venv, installcrawl4ai/Playwright, and browsers.Requires: Node.js 14+, Python 3.8+ (auto-detected; install via python.org or
brew install pythonon macOS if missing). First run takes 1-2 min.
Install Globally (one-time; enables
npx lilfetchin any terminal/workspace):npm run global-installOr:
npm install -g .macOS/Linux Note: If permission error, configure user-owned globals (one-time):
mkdir ~/.npm-global npm config set prefix '~/.npm-global' export PATH=~/.npm-global/bin:$PATH # Add to ~/.zshrc or ~/.bash_profileThen rerun without
sudo. On Windows, use admin prompt if needed.
Configure in Any VS Code Workspace (add to
.vscode/mcp.jsonor global MCP settings):{ "servers": { "lilFetch": { "type": "stdio", "command": "npx", "args": ["lilfetch"] } } }No paths or variables! Reload window (Cmd+Shift+P > "Developer: Reload Window") to activate.
Test It:
Manual: In any terminal,
npx lilfetch(starts server; send MCP JSON to stdin or Ctrl+C to stop).In Copilot Chat (any workspace): "Use lilFetch to scrape https://example.com to Markdown." Should output JSON with scraped Markdown.
Verify setup: If errors, check console for Python/browser issues (see Troubleshooting).
Option 2: Local Tarball (Fallback - Per-Project Isolation)
If avoiding globals (e.g., restricted env), use the tgz method:
Follow steps 1-2 from Option 1.
Pack:
npm run packCreates
lilfetch-1.0.0.tgz.
Configure in Target Workspace (use absolute path to tgz):
{ "servers": { "lilFetch": { "type": "stdio", "command": "npx", "args": ["/absolute/path/to/lilfetch-install/lilfetch-1.0.0.tgz"] } } }Test as above; repack after changes.
Tool Usage
The server exposes one tool: scrape_to_markdown
Parameters:
urls: Array of strings (required) – URLs to scrape.description: String (optional, default "scrape") – Label for output files.
Output: JSON array with scraped Markdown, success status, and filename suggestions.
Example call (in MCP context):
Development
Edit
mcp_server.pyfor Python logic.Update
bin/lilfetch.jsfor wrapper changes.Bump version in
package.json, thennpm run pack.For global testing:
npm install -g .thennpx lilfetch.
Requirements
Node.js >=14
Python 3.8+ (with pip)
~200MB disk for browsers (Playwright)
Troubleshooting
Python not found: Install Python 3.8+ and ensure
python3is in PATH.Venv issues: Delete
.venvand rerunnpm install.Browser errors: Run
python -m playwright installmanually in.venv/bin.Windows users: Use
pythoninstead ofpython3if needed; adjust paths inbin/lilfetch.js.
Future Plans
Publish to npm as
@jacob/lilfetchfornpx -y @jacob/lilfetch.Add more tools (e.g., CSS selector extraction).
License: MIT