How do I use Website Scraper MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Website Scraper MCP Server scrape and index https://example.com into AI Search" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Website Scraper MCP Server

by lalit9168

Overview Schema Related Servers Score Discussions

Python

Hybrid

Website Scraper MCP Server

A production-ready MCP (Model Context Protocol) server that allows any MCP-compatible AI agent to scrape websites, crawl internal pages, clean content, chunk it, and index everything into Azure AI Search — all through a clean, typed tool interface.

Related MCP server: Crawl4AI+SearXNG MCP Server

Architecture

website_scraper_mcp/
├── app.py                   ← Entry point (stdio / SSE transport)
├── server.py                ← MCP server + tool dispatcher
├── config.py                ← Pydantic Settings (env vars)
├── models.py                ← Input/Output Pydantic models
└── tools/
    ├── scrape.py            ← Tool 1 – static/dynamic detection + scraping
    ├── crawl.py             ← Tool 2 – BFS crawler, robots.txt aware
    ├── clean.py             ← Tool 3 – Trafilatura + BS4 content cleaning
    ├── chunk.py             ← Tool 4 – sliding window chunking
    └── azure_ai_search.py   ← Tools 5 & 7 – index + search

Tools

#	Tool	Description
1	`scrape_website`	Detect static/dynamic, scrape title/content/links
2	`crawl_website`	BFS crawl with depth limit + robots.txt
3	`clean_content`	Strip noise HTML, return readable text
4	`chunk_content`	Sliding window chunks (~1 000 chars, 200 overlap)
5	`index_to_ai_search`	Upload chunks to Azure AI Search
6	`index_website`	End-to-end pipeline: crawl → clean → chunk → index
7	`search_index`	Full-text search on the Azure AI Search index

Installation

Prerequisites

Python 3.11+
Azure AI Search service (free tier works for testing)

Steps

# 1. Clone the repo
git clone https://github.com/your-org/website-scraper-mcp.git
cd website-scraper-mcp

# 2. Create and activate a virtual environment
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux/Mac
source .venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Install Playwright browsers (Chromium)
playwright install chromium

# 5. Copy and fill environment variables
cp .env.example .env
# Edit .env with your Azure credentials

Running Locally

stdio mode (default — for MCP clients / AI agents)

python -m website_scraper_mcp.app
# or
python -m website_scraper_mcp.app --transport stdio

SSE mode (HTTP endpoint for browser-based / HTTP clients)

python -m website_scraper_mcp.app --transport sse --port 8000
# Server available at http://localhost:8000/sse

Running with Docker

# Build and start in SSE mode
docker compose up --build

# Stop
docker compose down

The container exposes port 8000 for SSE transport.

Environment Variables

Variable	Default	Description
`AZURE_SEARCH_ENDPOINT`	(required)	Azure AI Search service URL
`AZURE_SEARCH_KEY`	(required)	Admin API key
`AZURE_SEARCH_INDEX_NAME`	`website-content`	Target index name
`PLAYWRIGHT_TIMEOUT_MS`	`30000`	Playwright page load timeout (ms)
`PLAYWRIGHT_HEADLESS`	`true`	Run Chromium headless
`MAX_CRAWL_DEPTH`	`2`	Maximum crawl depth
`MAX_PAGES_PER_SITE`	`100`	Hard cap on pages per crawl
`CRAWL_DELAY_SECONDS`	`0.5`	Polite delay between requests
`CHUNK_SIZE`	`1000`	Characters per chunk
`CHUNK_OVERLAP`	`200`	Overlap between consecutive chunks
`LOG_LEVEL`	`INFO`	Python logging level

Sample MCP Client

Run the included example after starting the server in SSE mode:

python examples/mcp_client_example.py

Or configure it in your MCP-compatible agent (e.g. Claude Desktop mcp_config.json):

{
  "mcpServers": {
    "website-scraper": {
      "command": "python",
      "args": ["-m", "website_scraper_mcp.app", "--transport", "stdio"],
      "cwd": "/path/to/website-scraper-mcp",
      "env": {
        "AZURE_SEARCH_ENDPOINT": "https://your-service.search.windows.net",
        "AZURE_SEARCH_KEY": "your-key",
        "AZURE_SEARCH_INDEX_NAME": "website-content"
      }
    }
  }
}

Example API Requests

Via MCP client (Python SDK)

import asyncio
from mcp import ClientSession
from mcp.client.sse import sse_client

async def demo():
    async with sse_client("http://localhost:8000/sse") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Scrape a single page
            result = await session.call_tool("scrape_website", {"url": "https://example.com"})
            print(result)

            # Full pipeline
            result = await session.call_tool("index_website", {
                "url": "https://example.com",
                "max_depth": 2
            })
            print(result)

            # Search
            result = await session.call_tool("search_index", {
                "query": "What services does the company provide?",
                "top": 5
            })
            print(result)

asyncio.run(demo())

Tool input/output examples

scrape_website

// Input
{"url": "https://example.com"}

// Output
{
  "title": "Example Domain",
  "url": "https://example.com",
  "content": "This domain is for use in illustrative examples...",
  "links": ["https://www.iana.org/domains/example"],
  "is_dynamic": false,
  "metadata": {"description": "..."}
}

index_website

// Input
{"url": "https://example.com", "max_depth": 2}

// Output
{
  "url": "https://example.com",
  "pages_crawled": 4,
  "total_chunks": 38,
  "indexed_documents": 38,
  "failed_documents": 0,
  "status": "success",
  "errors": []
}

search_index

// Input
{"query": "What services does the company provide?", "top": 5}

// Output
{
  "query": "What services does the company provide?",
  "total_results": 3,
  "hits": [
    {
      "id": "abc123",
      "url": "https://example.com/services",
      "title": "Our Services",
      "content": "We provide cloud, AI, and data services...",
      "chunk_number": 0,
      "score": 9.8
    }
  ]
}

Error Handling

The server handles all errors gracefully and returns structured JSON error responses:

{
  "error": "HTTP 404 when fetching https://example.com/missing",
  "tool": "scrape_website"
}

Handled errors include: invalid URLs, HTTP 4xx/5xx, timeouts, Playwright failures, Azure Search quota errors, network issues, and duplicate document IDs.

License

MIT

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Related MCP Servers

Crawl4AI RAG MCP Server
Chillbruhhh
A
license
-
quality
D
maintenance
Provides AI agents and coding assistants with advanced web crawling and RAG capabilities, allowing them to scrape websites and leverage that knowledge through various retrieval strategies.
Last updated 2025-07-15
2
MIT
Crawl4AI+SearXNG MCP Server
Web Scraping RAG Systems Search
alexesom
A
license
-
quality
D
maintenance
Enables AI agents to search the web, crawl websites, and perform intelligent RAG queries with semantic search capabilities. Includes integrated private search engine, vector database storage, and optional knowledge graph for AI hallucination detection in code repositories.
Last updated 2025-08-20
1
MIT
Crawl4AI RAG MCP Server
RAG Systems Web Scraping Vector Databases
Rob-P-Smith
F
license
-
quality
D
maintenance
Enables AI assistants to crawl websites, extract and store web content with semantic search capabilities using vector embeddings, and retrieve information through natural language queries with tag-based filtering and intelligent content cleaning.
Last updated 2025-10-17
Crawl4AI RAG MCP Server
Web Scraping RAG Systems
utaschulz1
A
license
-
quality
D
maintenance
Provides web crawling and RAG capabilities for AI agents, enabling scraping of websites, storing content in a vector database (Supabase), and performing semantic search over crawled data.
Last updated 2026-02-23
MIT

View all related MCP servers

Related MCP Connectors

fastCRW
Scrape, crawl, map & search the web. Open-source, self-hostable Rust crawler & search for AI agents.
scrapi
Web scraping for AI agents. Converts URLs to clean, LLM-ready Markdown with anti-bot bypass.
mcp
Turn the web into structured, reliable, actionable enterprise data for AI Agents

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lalit9168/web-scrapping'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Website Scraper MCP Server

Table of Contents

Architecture

Tools

Installation

Prerequisites

Steps

Running Locally

stdio mode (default — for MCP clients / AI agents)

SSE mode (HTTP endpoint for browser-based / HTTP clients)

Running with Docker

Environment Variables

Sample MCP Client

Example API Requests

Via MCP client (Python SDK)

Tool input/output examples

Error Handling

License

Maintenance

Resources

Looking for Admin?

Tools

Related MCP Servers

Crawl4AI RAG MCP Server

Crawl4AI+SearXNG MCP Server

Crawl4AI RAG MCP Server

Crawl4AI RAG MCP Server

Related MCP Connectors

Latest Blog Posts

MCP directory API