agentfetch
Provides a scrape tool for CrewAI agents, enabling web scraping and crawling within CrewAI workflows.
Supports self-hosting via Docker Compose with API (port 8080), MCP SSE (port 8081), Redis, and optional crawl workers.
Provides web search functionality via DuckDuckGo as a fallback search engine when SearXNG is not configured.
Provides AgentFetchTools for LangChain agents, enabling web scraping, crawling, search, and extraction as LangChain tools.
Provides local LLM-based structured data extraction via Ollama, configured with OLLAMA_URL and OLLAMA_MODEL, without API costs.
Provides tools for OpenAI function calling, enabling web scraping, crawling, search, and extraction via get_tools.
Provides optional Redis-backed caching and job queue for horizontal scaling of crawl operations.
Provides web search functionality via SearXNG with optional result scraping, configured via SEARXNG_URL.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@agentfetchfetch https://en.wikipedia.org/wiki/OpenAI as markdown"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
agentfetch
Open-source web retrieval built for AI agents.
agentfetch is a free, local alternative to Firecrawl, Exa, and Parallel.ai. It fetches any webpage, crawls any site, and searches the web — returning clean markdown that AI agents can consume directly.
Works with LangChain, LlamaIndex, CrewAI, AutoGen, Claude MCP, OpenAI function calling, Gemini, Groq, and plain REST. No vendor lock-in, no API keys required.
Install
pip install git+https://github.com/S1D1ART/agentfetch.git
# Or with extra integrations:
pip install "agentfetch[langchain,llamaindex,crewai] @ git+https://github.com/S1D1ART/agentfetch.git"No PyPI account, no API tokens, no sign-up needed. GitHub is the source.
What makes it different
Smart Mode Router — detects JavaScript-heavy SPAs (Next.js, Nuxt, React) and falls back to Playwright headless browser automatically. Static pages use direct HTTP.
5-layer extraction pipeline — trafilatura → newspaper3k → readability-lxml → BeautifulSoup → plain text. Best-effort extraction from any HTML.
Never raises exceptions — always returns structured
FetchResultwith confidence scores, error fields, and injection detection. Agents can trust the output.Information saturation crawling — no arbitrary depth limits. CrawlStopper detects vocabulary saturation and content redundancy, stopping when enough data is gathered.
Prompt injection firewall — 13 patterns detected and redacted to
[REDACTED BY AGENTFETCH].Cloudflare bypass — optional
curl_cffiintegration with Chrome 124 TLS fingerprint impersonation.Robots.txt compliance — optional async parser with caching, crawl-delay, and sitemap discovery.
Proxy rotation — round-robin or random proxy pools with automatic failure tracking.
Local LLM extraction — optional Ollama integration for structured data extraction without API costs.
Redis-backed job queue — horizontal scaling for crawl operations with background workers.
Related MCP server: superFetch MCP Server
Tools
Tool | Description |
| Fetch any URL; auto-detects browser need. Supports ScrapeConfig (wait_for selectors, tag filtering, citation markers, proxies). |
| Recursive crawl with information saturation stopping, robots.txt compliance, deduplication. |
| Web search via SearXNG or DuckDuckGo with optional result scraping. |
| Structured data extraction by JSON schema via Ollama, Anthropic Claude, or CSS fallback. |
| Poll crawl job progress (in-memory or Redis). |
Quickstart
LangChain
from agentfetch.integrations.langchain.tools import AgentFetchTools
tools = AgentFetchTools
# Use with any LangChain agentMCP (Claude Desktop, Cursor, etc.)
pip install git+https://github.com/S1D1ART/agentfetch.git
agentfetch-mcp
# configure in Claude Desktop or any MCP hostREST API
pip install git+https://github.com/S1D1ART/agentfetch.git
agentfetch serve
curl -X POST http://localhost:8080/agent_scrape \
-d '{"url": "https://example.com"}'Python library
import asyncio
from agentfetch import smart_fetch
from agentfetch.core.schema import ScrapeConfig
result = asyncio.run(smart_fetch(
"https://en.wikipedia.org/wiki/Obsession_(2025_film)",
config=ScrapeConfig(
wait_for=".main-content",
exclude_tags=["nav", "footer"],
citation_links=True,
)
))
print(result.content) # clean markdown
print(result.citations) # [1], [2] URLsAll integrations
Framework | Install | Import |
LangChain |
|
|
LlamaIndex |
|
|
CrewAI |
|
|
AutoGen |
|
|
OpenAI / Gemini / Groq |
|
|
Claude MCP |
|
|
Ollama |
|
|
REST |
|
|
Configuration
Environment variables
Variable | Default | Description |
| — | Redis connection for caching + job queue |
| — | SearXNG instance for search (falls back to DuckDuckGo) |
| — | For Claude-powered |
| — | Ollama endpoint for local LLM extraction |
|
| Ollama model name |
|
| Cache TTL in seconds |
|
| HTTP fetch timeout (seconds) |
|
| Playwright browser timeout (seconds) |
|
| Max retries for failed requests |
|
| Delay between requests to same domain |
|
| Enable robots.txt compliance |
| — | Comma-separated proxy URLs or JSON array |
|
|
|
| — | Path to cookies file (Netscape or JSON) |
|
| API server port |
Self-host
docker-compose up -d
# Starts API (port 8080), MCP SSE (port 8081), Redis
# Optional crawl worker:
docker compose --profile worker up -dArchitecture
┌─────────────┐
│ Smart │
│ URL │
│ Router │
└──────┬──────┘
│
┌─────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────────┐
│ Static │ │ Cloudflare │ │ Playwright │
│ HTTP │ │ bypass │ │ Headless │
│ (httpx) │ │ (curl_cffi) │ │ Browser │
└─────┬──────┘ └──────┬───────┘ └───────┬────────┘
│ │ │
└─────────────────┼────────────────────┘
│
▼
┌─────────────────┐
│ Extraction │
│ Pipeline │
│ trafilatura → │
│ newspaper3k → │
│ readability → │
│ BS4 → plain │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Sanitizer │
│ (13 injection │
│ patterns) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Post-process │
│ • Citations │
│ • Dedup check │
│ • Max length │
│ • Markdown │
└────────┬────────┘
│
▼
┌─────────────────┐
│ FetchResult │
│ Pydantic │
│ response │
└─────────────────┘Tests
pip install -e ".[all]"
pytest tests/ -v
# 74 tests passingLicense
MIT — free for any use, including commercial.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/SID1ART/agentfetch'
If you have feedback or need assistance with the MCP directory API, please join our Discord server