website-reader-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@website-reader-mcpExtract the article from https://example.com/blog"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Website Reader MCP
A small production-ready Model Context Protocol (MCP) server built with Python and FastAPI. It exposes Website Reader tools over Streamable HTTP so an AI chat backend can fetch public webpages and receive cleaned, readable text.
What it does
Runs as a FastAPI app locally with uvicorn over HTTPS
Deploys to Vercel as a Python serverless app (HTTPS provided by Vercel)
Exposes MCP at
/mcp(Streamable HTTP transport)Protects the MCP endpoint with a static API key
Provides a small content extraction pipeline: raw fetch, Markdown, article extraction, metadata and summary preparation
Related MCP server: ORZ MCP
Tools
The server exposes five MCP tools so the AI Chat backend can pick the right extraction layer for the task:
Tool | Best for | Output |
| Raw/simple fetch for debugging or fallback when you also need HTTP status, final URL and content type | Cleaned page text plus basic title/description from HTML |
| RAG ingestion and LLM context | Clean, LLM-friendly Markdown with headings, paragraphs, links, lists and code blocks; boilerplate removed |
| Summaries, blog posts, news, docs and long-form pages | Main article text and Markdown, plus author, published date, description, site name and language |
| Link previews and routing | Title, description, author, published date, site name, language, image and canonical URL (Open Graph, Twitter card, JSON-LD, meta tags) |
| Preparing an article summary without coupling the server to an LLM | Article text plus a ready-to-use |
Markdown and article extraction are powered by trafilatura, with a lightweight BeautifulSoup fallback when trafilatura cannot find usable content. All tools return structured error messages instead of crashing on invalid URLs, timeouts, unsupported content types, empty pages or extraction failures.
summarize_article and LLMs
summarize_article deliberately does not call OpenAI or any other model from inside the MCP server. It returns the extracted text together with a summary_prompt string. The AI Chat backend can send summary_prompt to its existing model pipeline to produce the actual summary. This keeps the MCP server provider-agnostic.
Local setup
Requirements: Python 3.11+ and OpenSSL (for local dev certs)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .envEdit .env and set a real value for MCP_API_KEY.
Environment variables
Copy .env.example to .env:
MCP_API_KEY=change-me
APP_ENV=local
REQUEST_TIMEOUT_SECONDS=12
MAX_RESPONSE_CHARS=12000
MAX_HTML_BYTES=2000000
ALLOWED_SCHEMES=https,http
# MCP client host allowlist (see .env.example)
MCP_ALLOWED_HOSTS=
MCP_ALLOWED_ORIGINS=
HOST=0.0.0.0
PORT=8001
DEV_HTTPS=true
SSL_CERTFILE=certs/localhost.pem
SSL_KEYFILE=certs/localhost-key.pemThe real .env file is gitignored and should not be committed.
Create local HTTPS certs
Local development uses self-signed TLS certs. Generate them once:
chmod +x scripts/dev.sh scripts/generate_dev_certs.sh
./scripts/generate_dev_certs.shThis creates:
certs/localhost.pem
certs/localhost-key.pemThese files are gitignored and are for local dev only.
You do not need to run this manually if you use ./scripts/dev.sh — it auto-generates missing certs on first start.
Optional: trusted local certs with mkcert
If you prefer browser- and client-trusted local certs instead of self-signed ones:
brew install mkcert
mkcert -install
mkdir -p certs
mkcert -cert-file certs/localhost.pem -key-file certs/localhost-key.pem localhost 127.0.0.1Then use ./scripts/dev.sh as usual.
Run locally
./scripts/dev.shThis starts uvicorn with reload on:
https://localhost:8001Useful overrides:
# HTTP instead of HTTPS
DEV_HTTPS=false ./scripts/dev.sh
# Bind only to localhost
HOST=127.0.0.1 ./scripts/dev.shHealth check
Self-signed certs require -k with curl:
curl -k https://localhost:8001/healthExample response:
{
"status": "ok",
"service": "website-reader-mcp"
}MCP endpoint
The MCP Streamable HTTP endpoint is:
https://localhost:8001/mcpAuthentication is required. Use either header:
Authorization: Bearer <MCP_API_KEY>or:
X-API-Key: <MCP_API_KEY>Quick MCP test with curl
Initialize a session (stateless mode):
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "curl-test", "version": "0.1"}
}
}'List tools:
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list",
"params": {}
}'Call fetch_url:
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "fetch_url",
"arguments": {"url": "https://example.com"}
}
}'Replace change-me with your configured MCP_API_KEY.
Call fetch_markdown
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {
"name": "fetch_markdown",
"arguments": {
"url": "https://example.com/blog/my-article",
"max_chars": 12000
}
}
}'Example structured output:
{
"url": "https://example.com/blog/my-article",
"final_url": "https://example.com/blog/my-article",
"title": "My Article",
"markdown": "# My Article\n\nClean readable content...\n\n- point one\n- point two",
"content_length": 842,
"truncated": false,
"extraction_method": "trafilatura",
"error": null
}Call extract_article
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 5,
"method": "tools/call",
"params": {
"name": "extract_article",
"arguments": {
"url": "https://example.com/blog/my-article",
"max_chars": 12000,
"include_metadata": true,
"include_markdown": true
}
}
}'Example structured output:
{
"url": "https://example.com/blog/my-article",
"final_url": "https://example.com/blog/my-article",
"title": "My Article",
"author": "Jane Doe",
"published_date": "2024-05-01T10:00:00Z",
"description": "Short article description",
"site_name": "Example",
"language": "en",
"text": "Clean readable article text...",
"markdown": "# My Article\n\nClean readable article text...",
"content_length": 8452,
"truncated": false,
"extraction_method": "trafilatura",
"error": null
}If extraction fails, the tool returns a structured error instead of crashing:
{
"url": "https://example.com/article",
"final_url": "https://example.com/article",
"error": "Could not extract readable article content from this page.",
"text": null,
"markdown": null,
"extraction_method": "trafilatura"
}Call extract_metadata
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 6,
"method": "tools/call",
"params": {
"name": "extract_metadata",
"arguments": {"url": "https://example.com/blog/my-article"}
}
}'Example structured output:
{
"url": "https://example.com/blog/my-article",
"final_url": "https://example.com/blog/my-article",
"title": "My Article",
"description": "Short article description",
"author": "Jane Doe",
"published_date": "2024-05-01T10:00:00Z",
"site_name": "Example",
"language": "en",
"image": "https://example.com/images/cover.png",
"canonical_url": "https://example.com/blog/my-article",
"error": null
}Call summarize_article
curl -k -sS -X POST "https://localhost:8001/mcp/" \
-H "Authorization: Bearer change-me" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{
"jsonrpc": "2.0",
"id": 7,
"method": "tools/call",
"params": {
"name": "summarize_article",
"arguments": {
"url": "https://example.com/blog/my-article",
"max_chars": 12000,
"max_words": 150
}
}
}'Example structured output:
{
"url": "https://example.com/blog/my-article",
"final_url": "https://example.com/blog/my-article",
"title": "My Article",
"author": "Jane Doe",
"published_date": "2024-05-01T10:00:00Z",
"description": "Short article description",
"text": "Clean readable article text...",
"content_length": 8452,
"truncated": false,
"summary_prompt": "Summarize My Article in at most 150 words. Focus on the key points...\n\nArticle content:\nClean readable article text...",
"extraction_method": "trafilatura",
"error": null
}The chat backend passes summary_prompt to its own LLM to generate the final summary.
You can also connect with the MCP Inspector using Streamable HTTP transport, the HTTPS URL above, and the same API key. You may need to accept the self-signed certificate in your client.
Tests
pytestVercel deployment
Push this repository to GitHub.
Import the project in Vercel.
Set environment variables in the Vercel dashboard (at minimum
MCP_API_KEY).Deploy.
The included vercel.json routes all requests to app/main.py, which exports the ASGI app object required by @vercel/python. Vercel terminates HTTPS for you in production; the local cert files are not used there.
After deployment, your MCP endpoint will be:
https://<your-project>.vercel.app/mcpUse the same API key headers as in local development.
Security notes and limitations
The MCP endpoint is protected by a single static API key. Rotate the key if it is exposed.
Local HTTPS uses self-signed certificates. Do not reuse them outside local development.
SSRF protection blocks localhost, common internal hostnames, and private/link-local/multicast IP literals before fetching.
MCP host allowlist (
MCP_ALLOWED_HOSTS): optional restriction on whichHostheader values may access/mcp(in addition to the API key). Leave empty to disable. Set*to allow any host, list exact hosts (example.com), subdomain wildcards (*.example.com), or port wildcards for local dev (localhost:*). OptionalMCP_ALLOWED_ORIGINSrestricts browserOriginheaders when set.DNS resolution is not yet validated against resolved private IPs (see TODO in
app/services/fetcher.py).Only
httpandhttpsURLs are allowed.Responses are capped by
MAX_HTML_BYTESwhile downloading andMAX_RESPONSE_CHARS(ormax_chars) for returned text.No JavaScript rendering: pages that require a browser will not be fully readable.
No crawling, caching, or rate limiting yet.
Project structure
app/
main.py FastAPI app, health routes, MCP mount
config.py Environment settings
auth.py API key and MCP host allowlist middleware
schemas.py Response models
tools/
website_reader.py MCP tool registration (all five tools)
services/
fetcher.py HTTP fetch + URL validation (SSRF checks)
host_allowlist.py MCP_ALLOWED_HOSTS parsing and Host-header matching
extractor.py HTML to readable text (BeautifulSoup, used by fetch_url)
markdown_extractor.py HTML to Markdown (trafilatura + BeautifulSoup fallback)
metadata_extractor.py Metadata (Open Graph, Twitter, JSON-LD, meta tags)
article_extractor.py Article extraction and summary prompt preparation
scripts/
generate_dev_certs.sh Create local self-signed TLS certs
dev.sh Run uvicorn with HTTPS locally
tests/
test_fetcher.py
test_extractor.py
test_extract_article.py
test_markdown.py
test_metadata.py
test_summarize.pyNext steps
Possible follow-ups:
add caching
add rate limiting
add logging and request IDs
add an MCP client inside the existing AI Chat backend
add tools for
search_webandread_urlvalidate DNS-resolved IPs before fetching (stronger SSRF protection)
License
Licensed under the MIT License. See LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/nipuman/website-reader-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server