mcp-ddg-research
Provides tools to search DuckDuckGo and fetch webpages, with HTML fallback, caching, and text extraction.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-ddg-researchsearch for latest AI research papers"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-ddg-research
Lightweight MCP server for DuckDuckGo search with HTML fallback, safe webpage fetching, caching, and clean text extraction.
mcp-ddg-research is a self-hosted Python MCP server that exposes deterministic research primitives to MCP clients. It can run DuckDuckGo searches, fall back to DuckDuckGo's lightweight HTML endpoint when the ddgs provider fails, fetch webpages with SSRF protections, cache search/fetch responses, deduplicate URLs, and extract readable text from HTML pages.
The MCP client or agent is responsible for reasoning over the returned data. This server only returns structured search results and fetched page text.
What This Project Does
Searches DuckDuckGo through
ddgs.DDGS().text(...).Falls back to
https://html.duckduckgo.com/html/whenddgsfails, times out, rate limits, raises, or returns no results.Parses DuckDuckGo HTML fallback results with BeautifulSoup.
Resolves DuckDuckGo redirect URLs such as
/l/?uddg=....Deduplicates normalized result URLs.
Fetches webpages with strict URL and DNS safety checks.
Follows redirects manually and validates every redirect target.
Extracts clean text from HTML by removing script, style, navigation, footer, and similar boilerplate.
Caches search and fetch responses in a file-based JSON cache.
Provides a simple deep search tool that searches once and fetches top result pages concurrently.
Related MCP server: LLM Researcher
What This Project Does Not Do
No LLM integration.
No summarization.
No report generation.
No browser automation.
No proxy rotation.
No captcha bypassing.
No ranking with model endpoints.
No OpenAI, Anthropic, Ollama, LM Studio, or other model endpoint support.
Why HTML Fallback Exists
The ddgs package is the preferred provider because it offers a simple Python API and handles DuckDuckGo search details for normal use. Search providers can still fail because of network timeouts, temporary provider errors, rate limits, empty responses, dependency import problems, or upstream behavior changes.
When that happens, this server falls back to DuckDuckGo's lightweight HTML endpoint. The fallback uses conservative request defaults, browser-like headers, and BeautifulSoup selectors for .result, .result__a, and .result__snippet.
Available MCP Tools
ddg_search
Search DuckDuckGo and return structured results.
Arguments:
{
"query": "python mcp server fastmcp",
"max_results": 10,
"safe_search": "off",
"time_filter": "month",
"blocked_domains": [],
"allowed_domains": [],
"preferred_domains": []
}Argument rules:
query: string, required.max_results: integer, default10, minimum1, maximum30.safe_search: one ofoff,moderate,strict, defaultoff.time_filter: optional, one ofday,week,month,year.blocked_domains: optional list of domains to remove from results, default[].allowed_domains: optional list of domains to keep, default[].preferred_domains: optional list of domains to move earlier while preserving stable order, default[].
Response example:
{
"query": "python mcp server fastmcp",
"provider": "ddgs",
"results": [
{
"title": "MCP Python SDK",
"url": "https://github.com/modelcontextprotocol/python-sdk",
"snippet": "Python SDK for Model Context Protocol servers and clients."
}
],
"cached": false,
"error": null
}web_fetch
Fetch a single webpage and return clean text.
Arguments:
{
"url": "https://example.com/article",
"max_chars": 12000
}Argument rules:
url: HTTP or HTTPS URL.max_chars: integer, default12000, minimum1000, maximum50000.
Response example:
{
"url": "https://example.com/article",
"final_url": "https://example.com/article",
"title": "Example Article",
"content": "Readable extracted page text...",
"content_type": "text/html; charset=utf-8",
"cached": false,
"success": true,
"error": null
}ddg_deep_search
Search once, fetch top result pages concurrently, and return sources plus page content.
Arguments:
{
"query": "model context protocol python sdk",
"max_results": 10,
"max_pages": 5,
"max_chars_per_page": 12000,
"safe_search": "off",
"time_filter": "year",
"blocked_domains": [],
"allowed_domains": [],
"preferred_domains": [],
"max_concurrency": null
}Argument rules:
query: string, required.max_results: integer, default10, minimum1, maximum30.max_pages: integer, default5, minimum1, maximum10.max_chars_per_page: integer, default12000, minimum1000, maximum50000.safe_search: one ofoff,moderate,strict, defaultoff.time_filter: optional, one ofday,week,month,year.blocked_domains: optional list of domains to remove from search results before fetching, default[].allowed_domains: optional list of domains to keep before fetching, default[].preferred_domains: optional list of domains to move earlier before fetching, default[].max_concurrency: optional per-call page fetch concurrency, minimum1, maximum12. If omitted,MAX_CONCURRENCYis used.
Response example:
{
"query": "model context protocol python sdk",
"search_provider": "ddgs",
"sources": [
{
"title": "MCP Python SDK",
"url": "https://github.com/modelcontextprotocol/python-sdk",
"snippet": "Python SDK for Model Context Protocol servers and clients."
}
],
"pages": [
{
"title": "MCP Python SDK",
"url": "https://github.com/modelcontextprotocol/python-sdk",
"final_url": "https://github.com/modelcontextprotocol/python-sdk",
"content": "Extracted page text..."
}
],
"failed_pages": [],
"cached": false
}Domain Controls
Domain controls are opt-in. If you do not pass blocked_domains,
allowed_domains, or preferred_domains, search results preserve DuckDuckGo's
default ranking order after URL deduplication. The server does not apply a
built-in source bias, source boost, or domain blocklist.
Domain inputs are normalized by lowercasing, removing URL schemes, removing
paths and query strings, and stripping a leading www.. Matching supports exact
domains and subdomains. For example, docs.example.com matches example.com,
but example.com.evil.com does not.
Filtering order:
Apply
allowed_domainsif provided.Apply
blocked_domainsif provided.Apply
preferred_domainsif provided.
preferred_domains performs a stable partition: preferred matches move earlier,
relative order is preserved inside the preferred and non-preferred groups, and
no numeric score is invented.
Block domains:
{
"query": "self hosted photo backup",
"blocked_domains": ["example.com", "old-docs.example.org"]
}Allow only specific domains:
{
"query": "python mcp server",
"allowed_domains": ["github.com", "modelcontextprotocol.io"]
}Prefer domains without excluding others:
{
"query": "duckduckgo html search endpoint",
"preferred_domains": ["duckduckgo.com", "github.com"]
}Limit deep-search fetch concurrency for one call:
{
"query": "model context protocol python sdk",
"max_pages": 5,
"max_concurrency": 2
}Docker Stdio Usage
Build the local image:
docker build -t mcp-ddg-research:local .Run the server over stdio. This mode is auth-free because the MCP client owns stdin/stdout and there is no listening network socket:
docker run --rm -i -v "$PWD/data:/data" mcp-ddg-research:localDocker Stdio MCP Client Configuration
{
"mcpServers": {
"ddg-research": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"-v",
"/opt/mcp-ddg-research/data:/data",
"mcp-ddg-research:local"
]
}
}
}docker-compose Usage
The included compose file starts the server in streamable HTTP mode on /mcp.
It maps host port 49317 to container port 8000 and requires
Authorization: Bearer change-me-now by default.
Build and start the service:
docker compose up --build ddg-researchThe compose file persists cache data at:
~/docker/docker-data/mcp-ddg-research/cacheThe checked-in compose token is the placeholder change-me-now. It is
acceptable for local smoke tests only. Replace MCP_AUTH_TOKEN in
docker-compose.yml before using LAN, VPN, reverse-proxy, or Cloudflare Tunnel
deployments.
The compose file defaults MCP_ALLOWED_HOSTS=* and MCP_ALLOWED_ORIGINS=* so
the same container can run behind a LAN IP, hostname, domain, reverse proxy, or
HTTPS endpoint. In MCP SDK 1.27.2, wildcard Host/Origin validation is not
supported by the DNS rebinding middleware, so wildcard mode disables the SDK
Host/Origin allowlist and relies on the bearer token. To enable strict
Host/Origin checks, set exact comma-separated values such as:
MCP_ALLOWED_HOSTS="example.com,example.com:443,localhost:49317"
MCP_ALLOWED_ORIGINS="https://example.com,http://localhost:*"LAN HTTP Example
Set a real token in docker-compose.yml and start the server:
docker compose up -d --buildUse your server's LAN IP in the client URL:
http://YOUR_SERVER_IP:49317/mcpOpenCode remote MCP configuration for a LAN deployment:
{
"mcp": {
"ddg-research": {
"type": "remote",
"enabled": true,
"url": "http://YOUR_SERVER_IP:49317/mcp",
"oauth": false,
"headers": {
"Authorization": "Bearer change-me-now"
}
}
}
}HTTPS Reverse Proxy Example
Run the container on the server and terminate TLS in a reverse proxy. The proxy
should forward /mcp to http://127.0.0.1:49317/mcp and preserve standard
upgrade/streaming behavior.
Minimal Nginx-style location:
location /mcp {
proxy_pass http://127.0.0.1:49317/mcp;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_buffering off;
}OpenCode configuration for the HTTPS endpoint:
{
"mcp": {
"ddg-research": {
"type": "remote",
"enabled": true,
"url": "https://your-domain.example/mcp",
"oauth": false,
"headers": {
"Authorization": "Bearer change-me-now"
}
}
}
}Cloudflare Tunnel Example
Cloudflare Tunnel lets cloudflared make outbound-only connections from your
server to Cloudflare, so you can publish the MCP HTTP endpoint without opening
an inbound router/firewall port.
In the Cloudflare dashboard, create a tunnel and add a public hostname such as:
https://mcp.example.comIf cloudflared runs on the host, set the tunnel service URL to:
http://127.0.0.1:49317If cloudflared runs as another service in the same compose project/network,
set the tunnel service URL to the container service name and internal port:
http://ddg-research:8000Minimal compose service example for token-managed tunnels:
cloudflared:
image: cloudflare/cloudflared:latest
restart: unless-stopped
command: tunnel --no-autoupdate run --token ${CLOUDFLARE_TUNNEL_TOKEN}
depends_on:
- ddg-researchKeep CLOUDFLARE_TUNNEL_TOKEN outside version control. In OpenCode, use the
public HTTPS URL and keep the MCP bearer token header:
{
"mcp": {
"ddg-research": {
"type": "remote",
"enabled": true,
"url": "https://mcp.example.com/mcp",
"oauth": false,
"headers": {
"Authorization": "Bearer change-me-now"
}
}
}
}For production, replace change-me-now with a long random token. Cloudflare
Tunnel protects the network path, but the MCP server should still require its
own bearer token.
Do not expose HTTP mode to an untrusted network without HTTPS and a strong
MCP_AUTH_TOKEN. If MCP_AUTH_TOKEN is unset in HTTP mode, the server logs a
warning and accepts unauthenticated HTTP requests.
For MCP stdio clients, direct docker run -i is usually simpler than compose because the client owns stdin/stdout.
HTTP Smoke Tests
Raw curl is useful for checking HTTP authentication and Host handling, but it
does not perform a complete MCP streamable HTTP session. A request with the
correct bearer token may therefore return 406 Not Acceptable because curl did
not send the MCP client's expected Accept: text/event-stream negotiation
headers. That still proves the request passed bearer-token auth and Host
validation.
With the compose server running and the default compose token:
curl -i http://127.0.0.1:49317/mcpExpected: 401 Unauthorized.
curl -i \
-H "Host: YOUR_SERVER_IP:49317" \
-H "Authorization: Bearer change-me-now" \
http://127.0.0.1:49317/mcpExpected: usually 406 Not Acceptable from raw curl, but not 401 Unauthorized
and not 421 Misdirected Request.
With a real MCP client, such as OpenCode configured with the same URL and
Authorization header, ListTools and CallTool should work for ddg_search,
web_fetch, and ddg_deep_search.
Environment Variables
Variable | Default | Description |
|
| Directory for JSON cache files. |
|
| Search cache TTL in seconds. |
|
| Web fetch cache TTL in seconds. |
|
| DuckDuckGo provider and fallback timeout in seconds. |
|
| Web fetch timeout in seconds. |
|
| Default deep search page fetch concurrency limit when |
|
| MCP transport. |
|
| Host used for optional streamable HTTP mode. |
|
| Port used for optional streamable HTTP mode. |
| unset | Bearer token for HTTP mode. The included compose file sets this to |
|
| Comma-separated Host allowlist for HTTP mode. |
|
| Comma-separated Origin allowlist for HTTP mode. |
Cache Behavior
Search results are cached under the search cache namespace. Fetch responses are cached under the fetch cache namespace. Cache keys are SHA256 hashes of stable JSON payloads, so equivalent tool arguments map to the same file path.
Cache files are written atomically by writing a temporary file in the target cache directory and then renaming it into place. Corrupt, malformed, or expired cache files are ignored safely.
The default compose configuration persists cache files in /data/cache, with
~/docker/docker-data/mcp-ddg-research mounted into the container.
Rate Limit Notes
Defaults are intentionally conservative:
ddg_searchdefaults to 10 results and caps at 30.ddg_deep_searchdefaults to 5 fetched pages and caps at 10.Deep search concurrency defaults to 5.
Search and fetch results are cached to reduce repeated DuckDuckGo and website hits.
This project does not rotate proxies, bypass captchas, or attempt to evade rate limits. If DuckDuckGo blocks or rate limits requests, the tool returns structured errors instead of retrying aggressively.
SSRF and Security Protections
web_fetch only allows http and https URLs. It blocks known local or internal hostnames, including:
localhostmetadatametadata.google.internalhostnames ending in
.local,.localhost,.internal,.lan,.intranet
It also rejects IP addresses in private, loopback, link-local, reserved, multicast, or unspecified ranges, including:
0.0.0.0/810.0.0.0/8127.0.0.0/8169.254.0.0/16172.16.0.0/12192.168.0.0/16::1/128fc00::/7fe80::/10
DNS is resolved before fetching. If any resolved address is unsafe, the request is rejected. Redirects are followed manually, and every redirect target is validated before the next request.
Unsupported schemes such as file://, ftp://, ssh://, gopher://, and data: are never fetched.
Development Setup
Python 3.12 is required.
Create and activate a virtual environment:
python3.12 -m venv .venv
source .venv/bin/activateInstall the package with development tools:
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"Run the MCP server locally:
python -m mcp_ddg_research.serverTest Commands
Run tests:
python -m pytestRun lint:
python -m ruff check .Build a wheel/sdist using the configured build backend:
python -m pip install build
python -m buildRelease Automation
Releases are automated by .github/workflows/release.yml when commits or
release tags are pushed. The workflow is Python-native:
Install the project with development dependencies.
Run Ruff, pytest, compile checks, Python package build, and a Docker build.
On
mainbranch pushes, use Python Semantic Release to create the next GitHub release from conventional commits.On
v*tag pushes, treat the pushed tag as the release tag.If a release or release tag is present, build and push multi-architecture Docker images for
linux/amd64andlinux/arm64.
The workflow publishes these image tags:
DOCKERHUB_USERNAME/mcp-ddg-research:latest
DOCKERHUB_USERNAME/mcp-ddg-research:vX.Y.Z
ghcr.io/isyuricunha/mcp-ddg-research:latest
ghcr.io/isyuricunha/mcp-ddg-research:vX.Y.ZRequired repository secrets:
Secret | Purpose |
| Docker Hub namespace for the published image. |
| Docker Hub access token used by |
| Provided automatically by GitHub Actions for GitHub releases and GHCR publishing. |
Use conventional commits to drive release versions:
fix: ...andperf: ...create patch releases.feat: ...creates minor releases while the project is in0.x.Breaking changes are capped to a minor release while the project is in
0.x; after1.0.0, they create major releases.docs:,ci:,chore:,test:,style:, andrefactor:do not create a release by default.
The release workflow updates pyproject.toml and
src/mcp_ddg_research/__init__.py during semantic-release commits. It does not
maintain a changelog file. It is intentionally skipped for documentation-only
pushes and compose-file-only pushes.
Manual milestone releases are also supported. Create and push a vX.Y.Z tag
that points at the intended release commit, and the tag workflow publishes the
same Docker Hub and GHCR tags.
Limitations
DuckDuckGo HTML fallback does not support every option exposed by DuckDuckGo's full web interface.
time_filteris applied to theddgsprovider. The HTML fallback only sends the query and safe-search parameter.PDF parsing is not implemented in v1.
JavaScript-rendered pages are not rendered because there is no browser automation.
Some websites block automated HTTP clients or return incomplete content.
DNS safety checks reduce SSRF risk but cannot make arbitrary third-party fetching risk-free.
Optional Future Roadmap
These are optional future improvements, not current behavior:
Add configurable per-domain fetch throttling.
Add cache pruning utilities.
Add optional robots.txt awareness.
Add additional text extraction heuristics for common article layouts.
Add more integration tests around redirect chains and text content types.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/isyuricunha/mcp-ddg-research'
If you have feedback or need assistance with the MCP directory API, please join our Discord server