Skip to main content
Glama

Under Construction

lilFetch: Webpage to README Scraper MCP Server

lilFetch is an MCP (Model Context Protocol) server that scrapes webpages and converts them to clean Markdown, ideal for READMEs, docs, or workflows. It uses crawl4ai under the hood for robust web scraping.

Features

  • Scrape multiple URLs to structured Markdown.

  • Handles dynamic content with browser automation (via Playwright).

  • Outputs timestamped filenames with domain and description.

  • Easy integration with VS Code via MCP.

Quick Start (For JS/Node Developers)

No Python knowledge required! Just clone, install via npm, and configure in VS Code. The recommended global install makes it usable in any workspace without path management.

Option 1: Global Install (Recommended - Effortless, Cross-Workspace Use)

Install once globally, then npx lilfetch works anywhere. The Python backend (.venv/deps) stays in the cloned repo.

  1. Clone the Repo (do this once; keep the folder for the backend):

    git clone https://github.com/yourusername/webpage-to-readme-scraper.git lilfetch-install cd lilfetch-install
  2. Install Node Dependencies (sets up Python backend automatically):

    npm install
    • Runs postinstall to create .venv, install crawl4ai/Playwright, and browsers.

    • Requires: Node.js 14+, Python 3.8+ (auto-detected; install via python.org or brew install python on macOS if missing). First run takes 1-2 min.

  3. Install Globally (one-time; enables npx lilfetch in any terminal/workspace):

    npm run global-install
    • Or: npm install -g .

    • macOS/Linux Note: If permission error, configure user-owned globals (one-time):

      mkdir ~/.npm-global npm config set prefix '~/.npm-global' export PATH=~/.npm-global/bin:$PATH # Add to ~/.zshrc or ~/.bash_profile

      Then rerun without sudo. On Windows, use admin prompt if needed.

  4. Configure in Any VS Code Workspace (add to .vscode/mcp.json or global MCP settings):

    { "servers": { "lilFetch": { "type": "stdio", "command": "npx", "args": ["lilfetch"] } } }
    • No paths or variables! Reload window (Cmd+Shift+P > "Developer: Reload Window") to activate.

  5. Test It:

    • Manual: In any terminal, npx lilfetch (starts server; send MCP JSON to stdin or Ctrl+C to stop).

    • In Copilot Chat (any workspace): "Use lilFetch to scrape https://example.com to Markdown." Should output JSON with scraped Markdown.

    • Verify setup: If errors, check console for Python/browser issues (see Troubleshooting).

Option 2: Local Tarball (Fallback - Per-Project Isolation)

If avoiding globals (e.g., restricted env), use the tgz method:

  1. Follow steps 1-2 from Option 1.

  2. Pack:

    npm run pack
    • Creates lilfetch-1.0.0.tgz.

  3. Configure in Target Workspace (use absolute path to tgz):

    { "servers": { "lilFetch": { "type": "stdio", "command": "npx", "args": ["/absolute/path/to/lilfetch-install/lilfetch-1.0.0.tgz"] } } }
  4. Test as above; repack after changes.

Tool Usage

The server exposes one tool: scrape_to_markdown

  • Parameters:

    • urls: Array of strings (required) – URLs to scrape.

    • description: String (optional, default "scrape") – Label for output files.

  • Output: JSON array with scraped Markdown, success status, and filename suggestions.

Example call (in MCP context):

{ "name": "scrape_to_markdown", "arguments": { "urls": ["https://example.com"], "description": "example-site" } }

Development

  • Edit mcp_server.py for Python logic.

  • Update bin/lilfetch.js for wrapper changes.

  • Bump version in package.json, then npm run pack.

  • For global testing: npm install -g . then npx lilfetch.

Requirements

  • Node.js >=14

  • Python 3.8+ (with pip)

  • ~200MB disk for browsers (Playwright)

Troubleshooting

  • Python not found: Install Python 3.8+ and ensure python3 is in PATH.

  • Venv issues: Delete .venv and rerun npm install.

  • Browser errors: Run python -m playwright install manually in .venv/bin.

  • Windows users: Use python instead of python3 if needed; adjust paths in bin/lilfetch.js.

Future Plans

  • Publish to npm as @jacob/lilfetch for npx -y @jacob/lilfetch.

  • Add more tools (e.g., CSS selector extraction).

License: MIT

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jphdevsf/lilfetch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server