Skip to main content
Glama
tolik-unicornrider

Website Scraper MCP Server

Website Scraper

A command-line tool and MCP server for scraping websites and converting HTML to Markdown.

Features

  • Extracts meaningful content from web pages using Mozilla's Readability library (the same engine used in Firefox's Reader View)

  • Converts clean HTML to high-quality Markdown with TurndownService

  • Securely handles HTML by removing potentially harmful script tags

  • Works as both a command-line tool and an MCP server

  • Supports direct conversion of local HTML files to Markdown

Related MCP server: MCP Server Fetch Python

Installation

# Install dependencies
npm install

# Build the project
npm run build

# Optionally, install globally
npm install -g .

Usage

CLI Mode

# Print output to console
scrape https://example.com

# Save output to a file
scrape https://example.com output.md

# Convert a local HTML file to Markdown
scrape --html-file input.html

# Convert a local HTML file and save output to a file
scrape --html-file input.html output.md

# Show help
scrape --help

# Or run via npm script
npm run start:cli -- https://example.com

MCP Server Mode

This tool can be used as a Model Context Protocol (MCP) server:

# Start in MCP server mode
npm start

Code Structure

  • src/index.ts - Core functionality and MCP server implementation

  • src/cli.ts - Command-line interface implementation

  • src/data_processing.ts - HTML to Markdown conversion functionality

API

The tool exports the following functions:

// Scrape a website and convert to Markdown
import { scrapeToMarkdown } from './build/index.js';

// Convert HTML string to Markdown directly
import { htmlToMarkdown } from './build/data_processing.js';

async function example() {
  // Web scraping
  const markdown = await scrapeToMarkdown('https://example.com');
  console.log(markdown);
  
  // Direct HTML conversion
  const html = '<h1>Hello World</h1><p>This is <strong>bold</strong> text.</p>';
  const md = htmlToMarkdown(html);
  console.log(md);
}

License

ISC

Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tolik-unicornrider/mcp_scraper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server