mcp-research
The mcp-research server provides read-only tools for AI assistants to gather and process information from the web, academic sources, social media, and various file types, optimized for token efficiency and safety.
web_search: Search the web using a 3-tier cascade (Brave API → DuckDuckGo → HTML scraper), with options to summarize results and auto-fetch the top result's full content.fetch_url: Fetch any URL and convert to clean markdown, with SSRF protection, 24-hour caching, CAPTCHA detection, and configurable content length limits.research: Compound research pipeline that rewrites queries, searches, fetches pages in parallel, summarizes each source, and synthesizes a final cited answer — at three depth levels:quick(2 pages),standard(5 pages), ordeep(10 pages).youtube_essence: Extract transcripts, summaries, key points, chapters, and quotes from YouTube videos.deep_ingest: Extract and optionally summarize text from files including PDFs, Office documents (DOCX, XLSX, PPTX), audio, video, and images.academic_lookup: Resolve academic papers by DOI, ArXiv ID, or PubMed ID, fetching metadata and full text via institutional credentials.twitter_extract: Extract tweets and full threads from X/Twitter using multiple strategies.vault_status: Display loaded credential profiles and dependency status without exposing secrets.
Provides web search capabilities via the Brave Search API as the primary high-quality search tier for the web_search tool.
Enables web search via the DuckDuckGo library and HTML scraper, serving as automatic fallback tiers when the Brave API is unavailable or rate-limited.
Enables AI-powered content processing including query rewriting, page summarization, and research synthesis using configurable Ollama models.
1. Click on "Install Server".
2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
3. In the chat, type `@` followed by the MCP server name and your instructions, e.g., "`@mcp-research` research recent advances in renewable energy storage"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a [step-by-step guide with screenshots](https://glama.ai/blog/2025-07-08-how-to-install-and-use-mcp-servers).mcp-research
MCP server for web research, academic papers, Twitter/X, YouTube, and file ingestion. Eight tools for AI assistants — all via the MCP stdio protocol. Includes credential vault for institutional access, CAPTCHA detection, and token-efficient output.
Tools
Tool | Description |
| 3-tier search cascade: Brave API → DuckDuckGo → HTML scraper |
| Fetch any URL → clean markdown, with SSRF protection and 24h cache |
| Compound pipeline: query rewrite → search → parallel fetch → summarize → synthesize |
| YouTube video → transcript, summary, key points, chapters, quotes |
| Extract text from files: PDF, DOCX, XLSX, PPTX, audio, video, images |
| Resolve DOI / ArXiv / PubMed → metadata + full text via institutional access |
| Extract tweets and threads from X.com/Twitter |
| Show loaded credential profiles and dependency status (never exposes secrets) |
All tools are read-only — they fetch and transform content, never modify anything.
Install
pip install mcp-researchOr run directly with uvx (zero-install):
uvx mcp-researchOptional extras:
pip install 'mcp-research[twitter]' # yt-dlp for Twitter extraction
pip install 'mcp-research[youtube]' # yt-dlp + faster-whisper for YouTube
pip install 'mcp-research[academic]' # PyPDF2 for academic PDFs
pip install 'mcp-research[ingest]' # PDF, DOCX, XLSX, PPTX, audio support
pip install 'mcp-research[all]' # everythingCheck your setup:
mcp-research doctorUsage with Claude Code
Add to your Claude Code MCP config (~/.claude/settings.json or project .mcp.json):
{
"mcpServers": {
"research": {
"command": "uvx",
"args": ["mcp-research"],
"env": {
"BRAVE_API_KEY": "BSA...",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
}Usage with Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"research": {
"command": "uvx",
"args": ["mcp-research"],
"env": {
"BRAVE_API_KEY": "BSA..."
}
}
}
}Configuration
All configuration is via environment variables — no config files needed (except the optional vault).
Variable | Default | Description |
| (empty) | Brave Search API key. Falls back to DuckDuckGo if unset. |
|
| Ollama endpoint for summarization/synthesis. Set empty to disable. |
|
| Model to use for summarization and synthesis. |
|
| URL fetch cache directory. |
|
| Cache TTL in hours. |
|
| Search log directory (NDJSON). |
|
| Default max search results. |
|
| Credential vault file path. |
|
| Auto-reload vault when file changes. |
|
| Session idle timeout in seconds. |
Tool Details
web_search
web_search(query, max_results=5, summarize=False, auto_fetch_top=False)Searches the web using a 3-tier cascade for maximum reliability:
Brave Search API — fast, high quality (requires
BRAVE_API_KEY)DuckDuckGo library — no API key needed, retries on rate limit
DuckDuckGo HTML scraper — last-resort fallback
Options:
summarize: Use Ollama to summarize results (requires running Ollama)auto_fetch_top: Also fetch and return the full content of the top result
fetch_url
fetch_url(url, summarize=False, max_chars=15000)Fetches a URL and converts it to clean markdown:
SSRF protection: Blocks localhost, private IPs, non-HTTP schemes
Smart retry: Exponential backoff on 429/5xx, per-hop redirect validation
24h cache: SHA-256 keyed, configurable TTL
Content support: HTML → markdown, JSON → code block, binary → rejected
Smart truncation: Breaks at heading/paragraph boundaries, not mid-text
CAPTCHA detection: Flags Cloudflare, hCaptcha, reCAPTCHA, Akamai walls
Token-efficient: Default 15K chars (~4K tokens), adjustable via
max_chars
research
research(query, depth="standard", context="")Compound research pipeline:
Query rewrite — Ollama optimizes your question into search keywords
Web search — finds relevant pages (with zero-result retry expansion)
Parallel fetch — fetches top N pages concurrently
Summarize — Ollama summarizes each page
Synthesize — Ollama produces a final cited answer
Depth levels:
Depth | Pages | Synthesis |
| 2 | No |
| 5 | Yes |
| 10 | Yes |
All steps gracefully degrade without Ollama — you still get search results and page content.
youtube_essence
youtube_essence(url, mode="standard")Extracts structured content from YouTube videos:
Transcript: Auto-subtitles or Whisper transcription (local, private)
Summary: AI summary via Ollama
Key points: Bullet-point takeaways
Chapters: Timestamped segments
Quotes: Notable quotations (deep mode)
Modes: quick (TL;DR), standard (+ chapters), deep (+ quotes)
Requires yt-dlp. Optional: faster-whisper for audio-only videos, ffmpeg for media extraction.
deep_ingest
deep_ingest(path, include_types="", max_files=200, summarize=False)Extracts text from files in a directory or single file:
Text files:
.txt,.md,.json,.csv, source code, etc.PDF: Via PyPDF2 (optional dependency)
Office:
.docx,.xlsx,.pptx(optional dependencies)Audio/Video: Whisper transcription (optional)
Images: OCR via Ollama vision model (optional)
Type filter: text, pdf, audio, video, image, office
academic_lookup
academic_lookup(identifier, fetch_fulltext=True)Resolves academic papers from multiple identifier types:
DOI:
10.xxxx/...→ Crossref metadata + publisher redirectArXiv:
2301.12345→ abstract + PDFPubMed: PMID → E-utilities metadata → DOI chain
URL: Publisher page detection
Full text access via credential vault:
EZproxy rewriting (prefix and suffix modes)
Bearer token, API key, basic auth, cookie jar
Automatic publisher detection (IEEE, Springer, Elsevier, ACM, Wiley, Nature, JSTOR, etc.)
twitter_extract
twitter_extract(url, include_thread=False)Extracts tweets and threads from X.com/Twitter using a strategy cascade:
yt-dlp (primary) — works with cookie jar for authenticated access
Twitter API v2 — if bearer token configured in vault
HTML fetch — cookie-based last resort
Returns: text, author, timestamp, metrics (likes, retweets, replies), media URLs.
vault_status
vault_status()Shows loaded credential profiles, match patterns, and auth types — never exposes secrets. Also checks availability of optional dependencies.
Credential Vault
Create ~/.mcp-research/vault.yaml to configure authentication for protected sources:
version: 1
profiles:
# University EZproxy for IEEE
ieee-university:
match: "*.ieee.org/**"
ezproxy:
base_url: "https://ezproxy.myuniversity.edu/login?url="
mode: prefix
# Springer via API key
springer:
match: "*.springer.com/**"
auth:
type: api_key
header: "X-ApiKey"
value: "${SPRINGER_API_KEY}"
# X.com via browser cookies
twitter:
match: "*.x.com/**"
auth:
type: cookie_jar
path: "${HOME}/.mcp-research/cookies/twitter.txt"${VAR}resolved from environment variables — secrets never stored in plain textFirst matching profile wins (order matters)
Auth types:
bearer,basic,api_key,cookie_jar,headersEZproxy modes:
prefix(prepend base URL) orsuffix(domain rewriting)Hot-reload: vault file changes are picked up automatically
Token Efficiency
All tools produce compact output by default to avoid wasting AI context window tokens:
Tool | Default output | Override |
| ~15K chars (~4K tokens) |
|
| ~500 tokens per source | Prefers summaries over raw content |
| ~10K chars full text | Truncates with notice |
| 15 files, 300 char excerpts |
|
| 3K char transcript excerpt | Full transcript in result object |
Safety & Robustness
SSRF protection: Blocks localhost, private IPs, link-local, non-HTTP schemes on every hop
CAPTCHA detection: Identifies Cloudflare, hCaptcha, reCAPTCHA, Akamai, DDoS-Guard walls
Input validation: Size limits, URL validation, safe redirect following
No eval/exec: No dynamic code execution
Vault security: Secrets resolved from env vars,
repr()redacts all auth valuesCache isolation: Owner-only directory permissions (0o700)
Graceful degradation: Missing optional deps don't crash — features degrade with clear messages
CLI
mcp-research serve # Run MCP stdio server (default)
mcp-research search "query" # Search the web
mcp-research fetch https://example.com # Fetch URL to markdown
mcp-research youtube https://youtu.be/... # Extract YouTube video
mcp-research ingest ./docs/ # Extract text from files
mcp-research academic "10.1109/..." # Resolve academic paper
mcp-research tweet https://x.com/.../123 # Extract tweet
mcp-research vault # Show vault profiles
mcp-research doctor # Check dependenciesDevelopment
git clone https://github.com/MABAAM/Maibaamcrawler.git
cd Maibaamcrawler
pip install -e ".[all]"
pytest tests/ -v
python -m mcp_researchChangelog
v0.3.0
Credential vault: YAML config at
~/.mcp-research/vault.yamlwith env var interpolation, glob URL matching, EZproxy rewriting, hot-reloadSession pooling: Per-domain sessions with vault auth injection, cookie jar support, idle eviction
CAPTCHA detection: Identifies Cloudflare, hCaptcha, reCAPTCHA, Akamai, DDoS-Guard, generic bot walls
Academic lookup: DOI/ArXiv/PubMed resolution, Crossref metadata, institutional full text access via vault
Twitter/X extraction: yt-dlp, API v2, and cookie-based access with thread support
Token efficiency: Default output caps (~4K tokens for fetch, ~500 per research source) to preserve AI context
Doctor command:
mcp-research doctorchecks all dependencies and configurationWindows encoding fix: UTF-8 stdout/stderr wrapper prevents cp1252 crashes
v0.2.0
YouTube essence: Transcript extraction, AI summary, key points, chapters, quotes
Deep ingest: PDF, DOCX, XLSX, PPTX, audio, video, image text extraction
Ollama integration: Query rewriting, summarization, synthesis, vision OCR
Search logging: NDJSON event log for all operations
Brave Search: Primary search tier with API key support
v0.1.0
Initial release: 3 tools (web_search, fetch_url, research), SSRF protection, caching
License
MIT
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/MABAAM/Maibaamcrawler'
If you have feedback or need assistance with the MCP directory API, please join our Discord server