Skip to main content
Glama
by jmh108

MD MCP Webcrawler Project

A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.

Features

  • Extract website content and save as markdown files

  • Map website structure and links

  • Batch processing of multiple URLs

  • Configurable output directory

Installation

  1. Clone the repository:

git clone https://github.com/yourusername/webcrawler.git cd webcrawler
  1. Install dependencies:

pip install -r requirements.txt
  1. Optional: Configure environment variables:

export OUTPUT_PATH=./output # Set your preferred output directory

Output

Crawled content is saved in markdown format in the specified output directory.

Configuration

The server can be configured through environment variables:

  • OUTPUT_PATH: Default output directory for saved files

  • MAX_CONCURRENT_REQUESTS: Maximum parallel requests (default: 5)

  • REQUEST_TIMEOUT: Request timeout in seconds (default: 30)

Claude Set-Up

Install with FastMCP fastmcp install server.py

or user custom settings to run with fastmcp directly

"Crawl Server": { "command": "fastmcp", "args": [ "run", "/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py" ], "env": { "OUTPUT_PATH": "/Users/user/Webcrawl" }

Development

Live Development

fastmcp dev server.py --with-editable .

Debug

It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging

Examples

Example 1: Extract and Save Content

mcp call extract_content --url "https://example.com" --output_path "example.md"

Example 2: Create Content Index

mcp call scan_linked_content --url "https://example.com" | \ mcp call create_index --content_map - --output_path "index.md"

Contributing

  1. Fork the repository

  2. Create a feature branch (git checkout -b feature/AmazingFeature)

  3. Commit your changes (git commit -m 'Add some AmazingFeature')

  4. Push to the branch (git push origin feature/AmazingFeature)

  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Requirements

  • Python 3.7+

  • FastMCP (uv pip install fastmcp)

  • Dependencies listed in requirements.txt

-
security - not tested
A
license - permissive license
-
quality - not tested

Related MCP Servers

  • A
    security
    A
    license
    A
    quality
    A powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.
    Last updated -
    4
    623
    36
    MIT License
    • Apple
    • Linux
  • A
    security
    A
    license
    A
    quality
    A TypeScript-based MCP server utilizing the UseScraper API to provide web scraping capabilities, allowing users to extract content from webpages in various formats.
    Last updated -
    4
    MIT License
    • Apple
  • -
    security
    A
    license
    -
    quality
    A Python implementation of an MCP server that extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown.
    Last updated -
    4
    MIT License
    • Linux
    • Apple
  • -
    security
    A
    license
    -
    quality
    Toolset that crawls websites, generates Markdown documentation, and makes that documentation searchable via a Model Context Protocol (MCP) server for integration with tools like Cursor.
    Last updated -
    29
    MIT License
    • Linux
    • Apple

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jmh108/md-webcrawl-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server