WET - Web Extended Toolkit MCP Server
Enables searching academic papers on arXiv.
Provides web search via the Brave search engine.
Bypasses Cloudflare anti-bot protection to scrape content from protected sites.
Provides web search via DuckDuckGo.
Provides web search via Google.
Syncs indexed documentation across machines using Google Drive.
Enables searching academic papers on Google Scholar.
Bypasses Medium's restrictions to extract content.
Allows using OpenAI models for embeddings, reranking, and LLM-powered synthesis.
Enables searching academic papers on PubMed.
Provides metasearch web search via an embedded SearXNG instance.
Enables searching academic papers on Semantic Scholar.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@WET - Web Extended Toolkit MCP Serversearch for recent web scraping tutorials"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
WET - Web Extended Toolkit MCP Server
mcp-name: io.github.n24q02m/wet-mcp
Web search, content extraction, and library docs for AI agents -- 5-strategy scraping, runs without API keys.
Phase | Status | Scope |
Phase 1 | Shipped | web-core ScrapingAgent migration, smart chunks output, search polish, media slim |
Phase 2 | Shipped | Context7-level docs search: library index (Tier 1 + Tier 2), version-aware queries with token cap, project lock (Cabinets) |
Phase 3 | Shipped |
|
Current release: v3.x.
media(action="analyze")was removed in the v2.0.0 BREAKING release. Useimagine-mcp'sunderstandaction for vision/audio/video analysis. Seedocs/migration.mdfor the upgrade recipe.
Project | Tagline | Tag |
Knowledge graph for token-efficient code reviews -- semantic search and call-... | MCP | |
IMAP/SMTP email for AI agents -- read, send, organize folders, and manage att... | MCP | |
Composite MCP server for Godot Engine -- 17 composite tools for AI-assisted g... | MCP | |
Markdown-first Notion for AI agents -- pages, databases, blocks, and comments... | MCP | |
Telegram for AI agents -- messages, chats, media, and contacts across both bo... | MCP | |
Claude Code plugin marketplace for the n24q02m MCP servers -- install web sea... | Marketplace | |
Image and video understanding + generation for AI agents -- across Gemini, Op... | MCP | |
Chrome Extension for bulk operations on Jules tasks via batchexecute API -- a... | Tooling | |
Shared foundation for building MCP servers -- Streamable HTTP transport, OAut... | MCP | |
Persistent AI memory with hybrid search and embedded sync. Open, free, unlimi... | MCP | |
Lightweight Qwen3 text embedding and reranking via ONNX Runtime and GGUF | Library | |
Secrets without the server. | CLI | |
TACET: a self-distilling neuro-symbolic cascade that amortises LLM cost in kn... | Tooling | |
Shared web infrastructure package for search, scraping, HTTP security, and st... | Library | |
Open-source MCP server for AI agents: web search, content extraction, and lib... | MCP |
Table of contents
Related MCP server: WET - Web Extended Toolkit
Features
Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with query expansion, TTL cache (1 h general / 5 min time-sensitive), standardized citation format, and 200-token snippet cap. Optional cloud search backends (Tavily, Brave, Exa) as a fallback chain via
SEARCH_BACKENDSAcademic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
Content Extract -- 5-strategy escalation chain via
n24q02m-web-coreScrapingAgent(basic_http->tls_spoof->headlessCrawl4AI), markitdown bridge for low-tier HTML/MD fallback, smart chunks structured output (clean text + markdown + JSON-LD + code blocks + metadata), batch processing (up to 50 URLs), deep crawling, site mappingLocal File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
Media -- List + download images / videos / audio files.
analyzewas removed in v2.0.0 -- useimagine-mcp.understandfor vision/audio inferenceAnti-bot -- Stealth strategies bypass Cloudflare, Medium, LinkedIn, Twitter
Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere, xAI, Anthropic) selected per task via the
EMBEDDING_MODELS/RERANK_MODELS/LLM_MODELSmodel chains for higher-quality vectors and LLM featuresSync -- Cross-machine sync of indexed docs via Google Drive (OAuth Device Code, no browser redirect)
Quick install
# Method 1 (default): plugin install via Claude Code
/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@n24q02m-plugins
# Method 2 (CLI): direct uvx invocation
claude mcp add wet -- uvx wet-mcp
# Method 3 (recommended for HTTP / multi-device / OAuth)
docker run -d --name wet-mcp-http -p 8084:8080 \
-v wet-data:/data -e MCP_TRANSPORT=http \
-e PUBLIC_URL=https://wet.example.com \
n24q02m/wet-mcp:latestFull setup matrices live at the canonical docs site mcp.n24q02m.com/servers/wet-mcp/setup/ and the paste-to-agent snippets at claude-plugins/plugins/wet-mcp/setup-with-agent.md (per Spec F single source of truth).
Configuration
wet runs zero-config out of the box: web search uses an embedded local SearXNG, and embedding/reranking fall back to the bundled local Qwen3 ONNX models when no cloud keys are set. For higher-quality results, point each task at a cloud model chain. All settings are plain environment variables (no app prefix) -- in the HTTP self-host mode they are entered through the browser setup form instead.
Model chains (CSV provider/model,provider/model; order = fallback). Leave a
chain empty to use the local ONNX models (embedding/rerank) or to disable LLM
features (LLM):
Env var | Task | Empty default |
| Embeddings for docs search | Local Qwen3-Embedding ONNX |
| Result reranking | Local Qwen3-Reranker ONNX |
|
| LLM features disabled |
Provider keys -- the provider is inferred from each model's prefix; supply the
matching key (litellm <PROVIDER>_API_KEY convention):
Model prefix | Key env var | Get it at |
|
| jina.ai/api-key |
|
| aistudio.google.com/apikey |
|
| platform.openai.com |
|
| dashboard.cohere.com |
|
| console.x.ai |
|
| console.anthropic.com |
Any other litellm provider works via env passthrough -- see litellm provider docs for its key name.
Search backends -- SEARCH_BACKENDS (CSV, runtime fallback chain) over
searxng (default, local) plus optional cloud providers tavily / brave /
exa. Point at an external SearXNG with SEARXNG_URL. Cloud providers need
TAVILY_API_KEY / BRAVE_API_KEY / EXA_API_KEY.
Docs sync -- SYNC_ENABLED (default true), GOOGLE_DRIVE_CLIENT_ID
(required for sync), SYNC_FOLDER (default wet-mcp), SYNC_INTERVAL (default
300s). Sync uses Google Drive over the OAuth Device Code flow (no browser
redirect).
HTTP self-host -- MCP_TRANSPORT=http, PUBLIC_URL=<your-domain>. The setup
form is gated by MCP_RELAY_PASSWORD; multi-user deployments also require
CREDENTIAL_SECRET (per-user vault key) and MCP_DCR_SERVER_SECRET.
Example stdio config (cloud chains):
{
"mcpServers": {
"wet": {
"command": "uvx",
"args": ["wet-mcp"],
"env": {
"EMBEDDING_MODELS": "jina_ai/jina-embeddings-v5-text-small",
"RERANK_MODELS": "jina_ai/jina-reranker-v3",
"LLM_MODELS": "gemini/gemini-3-flash-preview",
"JINA_AI_API_KEY": "jina_xxx",
"GEMINI_API_KEY": "AIza_xxx"
}
}
}
}Status
Stable architecture with two transports: stdio (default, local) and
HTTP (self-host, OAuth-gated). No daemon-bridge layer and no auto-spawn
from stdio. The media.analyze action was removed in the v2.0.0 BREAKING
release -- see docs/migration.md for the upgrade
recipe. Current release line: v3.x.
Documentation
Full docs at mcp.n24q02m.com/servers/wet-mcp/setup/:
Setup -- install methods for Claude Code, Codex, Gemini CLI, Cursor, Windsurf, mcp.json
Modes overview -- stdio / local-relay / remote-relay / remote-oauth
Multi-user setup -- per-JWT-sub credential model
In-repo references (Spec F single source of truth: setup docs live in claude-plugins/plugins/wet-mcp/):
docs/ARCHITECTURE.md-- web-core ScrapingAgent integration, strategy chain, storage layout, LLM provider dispatchdocs/BENCHMARKS.md-- v1.x baseline coverage / latency placeholders + tier-1 fixture metrics
Install with AI agent -- paste this to your AI coding agent:
Install MCP server
wet-mcpfollowing the steps at https://raw.githubusercontent.com/n24q02m/claude-plugins/main/plugins/wet-mcp/setup-with-agent.md
Tools
6 MCP tools (3 domain + config + help + config__open_relay). The legacy
setup tool merged into config action dispatch.
Tool | Description |
| Web (SearXNG metasearch), news, images, academic research (Scholar / arXiv / PubMed / CrossRef / Semantic Scholar / BASE), library docs (HyDE + FTS5), find similar pages. Includes |
| URL -> smart chunks dict ( |
|
|
|
|
| Per-tool documentation: |
| Re-trigger the zero-config relay setup flow (prints a fresh relay URL for the browser form). Registered via |
Media boundary: For vision / audio understanding (image captioning, OCR, audio transcription, video summarization), use imagine-mcp.
media.analyzewas removed in wet v2.0.0 -- useimagine-mcp.understandinstead.
Comparison
How wet-mcp stacks up against direct competitors in each pillar:
Capability | wet-mcp | Brave Search | Tavily | Firecrawl | Context7 |
Web search | Yes (SearXNG aggregation) | Yes | Yes | No | No |
Extract URL | Yes (5-strategy chain) | No | Yes (basic) | Yes | No |
Media list / download | Yes | No | No | No | No |
Library docs search | Yes (Tier 1 curated + Tier 2 on-demand, version-aware, Cabinets) | No | No | No | Yes |
Academic research | Yes (6 providers) | No | No | No | No |
Self-hostable | Yes | No | No | No | Yes |
Free tier | Yes (open source) | Limited | Limited | Limited | Yes |
Security
SSRF prevention -- URL validation on crawl targets
Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
Error sanitization -- No credentials in error messages
File conversion sandboxing -- Optional
CONVERT_ALLOWED_DIRSrestriction
Build from Source
git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcpDeploy to Cloudflare
Run your own single-user wet instance serverless on Cloudflare (Containers + D1 + Vectorize + KV).
Prerequisites: a Cloudflare account on the Workers Paid plan and the wrangler CLI.
git clone https://github.com/n24q02m/wet-mcp && cd wet-mcpwrangler loginProvision resources and apply the D1 schema:
wrangler d1 create wet-docs wrangler d1 execute wet-docs --file migrations/0001_init_wet.sql --remote wrangler vectorize create wet-docs-vectors --dimensions 768 --metric cosine wrangler kv namespace create wet-kvPaste the returned IDs into
wrangler.jsonc.Push the container image to your Cloudflare managed registry (CF Containers cannot pull from external registries directly), then set
<YOUR_ACCOUNT_ID>inwrangler.jsonc:docker pull ghcr.io/n24q02m/wet-mcp:beta docker tag ghcr.io/n24q02m/wet-mcp:beta wet-mcp:beta wrangler containers push wet-mcp:beta # prints registry.cloudflare.com/<ACCOUNT_ID>/wet-mcp:betaSet secrets (use
SEARXNG_URLwith basic-auth userinfo, e.g.https://user:pass@searxng.example.com, orTAVILY_API_KEYif you setSEARCH_BACKEND=tavily):wrangler secret put CREDENTIAL_SECRET wrangler secret put JINA_AI_API_KEY wrangler secret put GOOGLE_VERTEX_EXPRESS_API_KEY wrangler secret put XAI_API_KEY wrangler secret put MCP_RELAY_PASSWORD wrangler secret put MCP_DCR_SERVER_SECRET wrangler secret put SEARXNG_URLwrangler deployand complete setup in the browser relay form at your Worker domain.
Storage maps to Cloudflare via MCP_STORAGE_BACKEND=cf-kv (credentials/tokens, encrypted),
DOCS_DB_BACKEND=cf-d1 (docs + BM25 full-text), and Vectorize (embeddings). Web search uses
a SearXNG instance (SEARCH_BACKEND=searxng, SEARXNG_URL) or Tavily (SEARCH_BACKEND=tavily);
embed/rerank are forced cloud via EMBEDDING_MODELS/RERANK_MODELS.
Trust Model
This plugin implements TC-Local (machine-bound, single trust principal). See mcp-core trust model for full classification.
Mode | Storage | Encryption | Who can read your data? |
stdio (default) |
| AES-GCM, machine-bound key | Only your OS user (file perm 0600) |
HTTP self-host | Same as stdio | Same | Only you (admin = user) |
License
MIT -- See LICENSE.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/expandingideas-ai/mcp-wet'
If you have feedback or need assistance with the MCP directory API, please join our Discord server