ai-first-scraper-mcp
This server lets you fetch and search the web, returning clean, ad-free Markdown content ready for AI reasoning — no raw HTML parsing required.
fetch_page: Fetch a single URL (HTML or PDF) and return its main content as clean Markdown. Supports an optionalmax_tokenssoft cap to truncate large pages.fetch_pages_batch: Submit up to 25 URLs at once and fetch them in parallel. Each result includes metadata (title, word count, links) or an error if a URL failed — much faster than sequential calls.search_web: Run a free-text web search and automatically retrieve the top-k result pages (1–10, default 5) already converted to Markdown. Each result includes the URL, title, snippet, and full Markdown content, making it easy to find, evaluate, and cite fresh information in one call.
ai-first-scraper-mcp
Plug Claude Desktop, Cursor, or Cline straight into an ad-free web scraper + search engine. Three tools, one line of config.
What it does
Adds three tools to any MCP-compatible agent:
Tool | What it does |
| Fetch one URL → return clean Markdown (HTML or PDF). |
| Fetch up to 25 URLs in parallel → return Markdown for each. |
| Run a web search and return the top-k result pages already converted to Markdown. |
No more "the model called curl and then tried to parse 80kB of ad HTML." Your agent receives clean Markdown ready to reason about.
Backed by the ai-first-scraper and ai-first-search APIs.
Related MCP server: MCP Search & Fetch
Install
Fastest — uvx (no install, runs from PyPI on demand)
// claude_desktop_config.json / cline_mcp_settings.json / ~/.cursor/mcp.json
{
"mcpServers": {
"ai-first-scraper": {
"command": "uvx",
"args": ["ai-first-scraper-mcp"]
}
}
}Restart your client (Claude Desktop / Cursor / Cline). The three tools above will appear automatically.
Alternative — pip install
pip install ai-first-scraper-mcp{
"mcpServers": {
"ai-first-scraper": {
"command": "ai-first-scraper-mcp"
}
}
}Where the config file lives
Client | Config path |
Claude Desktop (macOS) |
|
Claude Desktop (Windows) |
|
Cursor |
|
Cline (VS Code) |
|
Point at your own backend (optional)
By default this server calls the public ai-first-scraper.onrender.com and
ai-first-search.onrender.com instances. If you want to self-host, set env
vars in your MCP config:
{
"mcpServers": {
"ai-first-scraper": {
"command": "uvx",
"args": ["ai-first-scraper-mcp"],
"env": {
"SCRAPER_URL": "https://your-scraper.example.com",
"SEARCH_URL": "https://your-search.example.com",
"AFS_TIMEOUT": "60"
}
}
}
}Verify it works
Open your MCP client and ask the agent:
"Use the search_web tool to find the top 3 recent articles about MCP and summarize them in 5 bullets each."
You should see the agent call search_web, get back Markdown for each result,
and produce the summary without ever touching raw HTML.
Companion projects
ai-first-scraper — the per-URL Markdown cleaner this MCP server fans out to.
ai-first-search — search → scrape → markdown pipeline.
mcp-rec — record & replay any MCP server's traffic for tests and bug reports.
llm-cache-proxy — local cache for OpenAI/Anthropic API calls.
promptlocker — lockfile for prompts.
context-diff — see what blew up your Claude Code context window.
agentwatch — overlay for browser AI agents.
Develop locally
git clone https://github.com/yubinkim444/ai-first-scraper-mcp.git
cd ai-first-scraper-mcp
uv sync # or: pip install -e .
ai-first-scraper-mcp # speaks MCP over stdioTo test against a local client, point its MCP config at the same command.
License
MIT © yubinkim444
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/yubinkim444/ai-first-scraper-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server