research-mcp
Allows searching the web using a SearXNG instance, providing search results with titles, URLs, and snippets.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@research-mcpsearch for MCP server examples and read the top result"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
research-mcp
A stateless MCP facade that hides a pyramid of search/read providers behind a single streamable-http MCP endpoint and exposes just 3 clean tools with good Russian help texts. An LLM gets a simple "search → read" toolset; behind it, several providers are tried, merged, and failed over automatically.
The app does no authentication — it is published through Traefik + basicAuth
on the host. It holds no application state: the only thing persisted is a log
file under data/ (kept on a volume).
Tools
Tool | What it does |
| Search across all enabled providers, merge + dedup → ranked list (title, URL, snippet). Search only. |
| One page or PDF → clean Markdown. Auto-detects type, walks the read pipeline (light → heavy) until one succeeds. |
| Up to 20 urls concurrently → list of |
Related MCP server: crawl-mcp
Architecture: types + instances
Providers are plugins. We separate:
type — an implementation class (e.g. the
searxngsearch provider), one per module insrc/providers/, registered with@register("type").instance — a configured copy of a type with its secrets/URL resolved from named environment variables (multiple instances of one type are allowed, e.g.
tavily-1/tavily-2with different keys).
Which instances exist and the order each pipeline tries them is configured in
code (src/pipeline_config.py); keys/URLs come from ENV by variable name.
Search pipeline (
searxng → serper → exa): enabled instances run concurrently; results are merged and deduplicated by normalized URL (earlier pipeline position wins), then trimmed tonum_results.Read pipeline (
trafilatura → jina → crawl4ai → tavily-1 → tavily-2 → firecrawl): a single probe GET classifies the url. PDFs (Content-Type /.pdf/%PDFmagic) are extracted with pypdf; for HTML, that same body is handed totrafilaturaso the hot path never GETs twice, then the remaining instances are tried in order and the first to return content>= FALLBACK_MIN_CHARSwins.
Cross-cutting: one transient retry (5xx / transport errors) with a short backoff;
402 (out of credits) / 429 (rate limited) are treated as a provider failure →
next instance (this is what makes tavily-1 → tavily-2 fail over).
An instance is enabled only if its required env var(s) are set; otherwise it
is skipped with a log line. trafilatura needs no config (always on); jina
works keyless (its key is optional). At startup the server requires at least one
search and one read instance, else it exits with a clear message.
Adding a provider
Write
src/providers/<type>.pywith a class decorated@register("<type>")implementingSearchProvider.search(...)orReadProvider.read(...).Import the module in
src/providers/__init__.py(so the decorator runs).Add an
Instance("name", "<type>", api_key_env="YOUR_ENV_NAME")line insrc/pipeline_config.pyand reference itsnameinSEARCH_PIPELINE/READ_PIPELINE. Use the ENV var NAME, never a value.Document the env var in
.env.example.
Quick start
make install # create .venv + install dev/test deps
cp .env.example .env # fill in the keys you have (shortcut: make env)
make test # run tests
make run # run the server (streamable-http on MCP_HOST:MCP_PORT, endpoint /mcp)Configuration
All config comes from ENV / .env (see .env.example). Provider secrets/URLs
are read by name in the instance loader, not declared as Settings fields. The
non-secret knobs (all defaulted): MCP_HOST, MCP_PORT, LOG_LEVEL,
LOG_FILE, LOG_ROTATION, LOG_RETENTION, REQUEST_TIMEOUT,
FALLBACK_MIN_CHARS, READ_PAGES_CONCURRENCY, RETRIES. The read_pages
per-call url cap is a fixed 20 (hard constant, matching the tool description) —
not configurable.
Provider env vars: SEARXNG_URL, SERPER_API_KEY, EXA_API_KEY, JINA_API_KEY
(optional), CRAWL4AI_URL + CRAWL4AI_TOKEN, TAVILY_1_API_KEY,
TAVILY_2_API_KEY, FIRECRAWL_API_KEY.
Logging
Besides stderr (captured by Docker's rotation-capped json-file driver), the
server writes a persistent log file to data/research-mcp.log (default;
LOG_ROTATION=20 MB, LOG_RETENTION=14 days). It lives on the data/ volume,
so it survives container restarts and image updates. The file carries one
per-request line per tool call — search (query, which provider instances
actually ran, result count, latency) and read (url, the winning provider/tier
or pdf, ok, latency), plus a read_pages count=N ok=K summary — making it
useful for analyzing how requests distribute across provider tiers. No request
bodies or secrets are logged, only urls/queries, provider names, counts, timings.
Deployment
CI builds the image and pushes it to ghcr.io (test → build, tags latest +
sha). On prod we pull the prebuilt image via docker-compose.yml (behind
Traefik + basicAuth, watchtower auto-updates latest; the data/ volume keeps
the log file across updates) — we never build on prod.
Layout
Path | Purpose |
| Provider interfaces + |
|
|
| One module per provider type. |
| PDF detection + pypdf text extraction (used by the pipeline). |
| In-code instances + pipeline order. |
| Instance loader + search/read logic. |
| Non-secret knobs (pydantic-settings). |
|
|
| Thin entry point: build server, run streamable-http. |
| pytest suite (network mocked with respx). |
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/vvzvlad/research-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server