Schema | free-search-mcp

free-search-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`SEARCH_MCP_CACHE_DIR`	No	Directory for cache	~/.cache/search-mcp
`SEARCH_MCP_FETCH_STRATEGY`	No	Fetch strategy: auto, http, or browser	auto
`SEARCH_MCP_DEFAULT_ENGINES`	No	JSON list of default search engines	["duckduckgo","mojeek","startpage"]
`SEARCH_MCP_BROWSER_HEADLESS`	No	Run browser in headless mode	true
`SEARCH_MCP_BROWSER_POOL_SIZE`	No	Number of concurrent browser pages	2
`SEARCH_MCP_CACHE_TTL_SECONDS`	No	Cache TTL in seconds (7 days)	604800
`SEARCH_MCP_MAX_CONTENT_CHARS`	No	Maximum content characters per result	50000
`SEARCH_MCP_RATE_LIMIT_PER_MINUTE`	No	Rate limit per engine per minute	30
`SEARCH_MCP_MAX_RESULTS_PER_ENGINE`	No	Maximum results per engine	10
`SEARCH_MCP_FETCH_RATE_LIMIT_PER_MINUTE`	No	Shared fetch rate limit per minute	20

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
searchA	Run a multi-engine web search and return a ranked, deduplicated link list. Best for: - Discovery queries ("what is X", "find me X", "who is X"). - Getting a list of URLs you can hand to `fetch` / `fetch_batch` next. - Topics likely to be after your knowledge cutoff (use `freshness="week"`). - Filtering to specific domains (`include_domains=["python.org"]`) or content types (`category="paper"\|"pdf"\|"github"\|"news"\|"forum"\|"blog"`). Not recommended for: - You already know the URL -> use `fetch` instead. - You want both links AND their full text in one call -> use `research`. - You want to query pages already in the local cache -> use `cache_search`. - Reading PDFs/DOCX from a known URL -> use `read_doc`. Returns: - markdown (default): numbered list of `n. title`, `<url>`, snippet — ~40% fewer tokens than json. - json: dict with `results` (list of {title,url,snippet,engines,score}), `engines`, `cached`, optional `errors` map, optional `hint` string. Common mistakes: - Passing a URL as `query` — that's `fetch`'s job. - Cranking `max_results` to 50 hoping for better recall; engines cap around 10-20 each, anything beyond is duplicate noise. - Adding `engines=["startpage","brave","bing","baidu"]` by default — those need browser rendering or captcha-friendly conditions; stick with the defaults unless they returned 0. If the defaults DO return 0, the keyless HTTP extras `engines=["google"]` or `engines=["anysearch"]` (no key, no browser) are the best recovery before reaching for the browser-gated ones. - Using `category="news"` for breaking news without also setting `freshness="day"` — the index lag is days, not minutes. Args: query: Natural-language query (the same string a human would type). engines: Subset of `engines()`. None = duckduckgo+mojeek+googlenews+bing. (startpage is opt-in and browser-rendered.) max_results: Merged result count after dedup. 5-20 is the useful range. use_cache: Reuse the last result for this exact (query, engines, max_results, AND all active filters — freshness, include/exclude domains, category, include/exclude text) within the cache TTL. Changing any filter is a different cache entry. False forces a re-fetch. max_age_hours: Treat cached results older than this as a read miss; a fresh result is ALWAYS written back to the cache regardless of this value, so caching is never disabled. Use 0 to force-refresh while keeping cache writes; None = use server default TTL (7 days). freshness: "day"\|"week"\|"month"\|"year" — restrict to recent results. Best-effort: applied as an engine time-window param AND a client-side date check, but most HTML-engine results carry no parseable date, so undated results are kept rather than dropped (unknown != old). Treat it as a strong hint, not a hard filter; googlenews dates are exact. include_domains: List of domains to restrict to (e.g. ["python.org"]). exclude_domains: List of domains to exclude. category: "news"\|"pdf"\|"github"\|"paper"\|"forum"\|"blog" — content-type shortcut. "paper" => arxiv/acm/springer/ieee/etc; "forum" => reddit/HN/stackexchange; "github" => code forges (github/gitlab/ codeberg/bitbucket/sourceforge/...). "news" keeps only ~33 major outlets (client-side whitelist), so most DDG/Mojeek hits are dropped — pair it with the default engines (googlenews is auto-added) and note googlenews URLs resolve to the publisher on fetch/research. include_text: Substring required in title or snippet (case-insensitive). exclude_text: Substring forbidden in title or snippet. format: "markdown" (default) or "json".
fetchA	Fetch one URL and return reader-mode Markdown of the main content. Best for: - You already have a URL (from `search`, the user, or your own knowledge) and need the actual page text. - Verifying a single claim by reading the source. - Pages that need reader-mode cleanup (nav/footer/scripts stripped). Not recommended for: - Multiple URLs at once -> use `fetch_batch` (concurrent, one round-trip). - "Search then read top N" -> use `research` (one call, not two). - PDF/DOCX URLs -> use `read_doc` (proper binary parsing). - You don't have a URL yet -> use `search` first. Returns: - markdown (default): a small header (URL, render method, token count) plus the cleaned page body. - json: {url, title, content, method, truncated, tokens_estimated, author, published_date, sitename}. Common mistakes: - Passing a search query instead of a URL. - Using `render="http"` on a JS-only SPA — it returns near-empty content; use "auto" (default) or "browser". - Forgetting that results are cached 7 days — use `force_refresh=True` or `max_age_hours=0` for a fresh pull. Args: url: Absolute http(s) URL. render: "auto" (try HTTP, fall back to stealth Chromium), "http" (fast, fails on JS), "browser" (slow, robust). force_refresh: Bypass the page cache entirely. max_age_hours: Treat cached pages older than this as a miss. 0 = same as force_refresh. None = server default TTL (7 days). format: "markdown" or "json".
fetch_batchA	Fetch a list of URLs in parallel. Per-URL failures do not raise. Best for: - 2+ URLs you want to read in one round-trip. - Reading the top N results of a previous `search` call. Not recommended for: - A single URL -> `fetch` (no list-wrapping overhead). - "Search and then read" -> `research` collapses both into one tool call. - PDFs/DOCX -> `read_doc` per file. Returns: - markdown (default): each page rendered as a Markdown section, separated by horizontal rules; failed URLs become inline error notes. - json: list[dict], one entry per URL, with `error` set on failures. Common mistakes: - Passing a single URL inside a 1-element list — use `fetch` directly. - Assuming an exception means the whole batch failed; check each item's `error` field instead. Args: urls: List of absolute http(s) URLs. render: Same as `fetch`. format: "markdown" or "json".
read_docA	Read an http(s) document (or a sandboxed local file) into Markdown. Best for: - Remote PDFs and DOCX from an http(s) URL (parsed locally, no remote API). - Local PDF/DOCX/text/Markdown files — ONLY when local reads are enabled (see Security below). - Paginating through a long document via `start` / `length`. Not recommended for: - Arbitrary HTML web pages -> `fetch` does reader-mode cleanup that this tool does not. - Pages discovered through search -> `fetch` or `research`. Security (local files are sandboxed and OFF by default): - Local-file reads are DISABLED unless the server operator sets the SEARCH_MCP_DOCUMENT_ROOT env var to a directory. With it unset, a local path raises a "local file reads are disabled" error — pass an http(s) URL instead, or ask the operator to enable the sandbox. - When enabled, `source` must resolve INSIDE that root; relative paths resolve against the root (not the process CWD) and any `..` traversal that escapes the root is rejected. `file://` URLs are always rejected. - Remote http(s) sources are unaffected by this setting. Returns: - markdown (default): rendered document text with a small header. - json: {content, title, format, total_chars, start, returned_chars, truncated}. Use `total_chars` and `returned_chars` to drive pagination. Common mistakes: - Calling this on a normal article URL — you'll get raw HTML noise; use `fetch` instead. - Forgetting to advance `start` when paginating: next call should pass `start = previous_start + returned_chars`. - Passing a negative `length` (raises an error) or a `start` past the end (clamped to EOF: you'll get `returned_chars == 0`, `start == total_chars`, and `truncated == False` — that's the signal you've paged off the end). Args: source: http(s) URL, or a local path UNDER SEARCH_MCP_DOCUMENT_ROOT when local reads are enabled (disabled by default — see Security). start: Character offset to begin reading from. Default 0. Clamped into [0, total_chars]; a negative value is treated as 0. length: Max characters to return; None = read to end (still capped by the per-call max content size). Must be >= 0 — a negative length is rejected with a ValueError. format: "markdown" or "json".
researchA	One-shot research: search the web, fetch the top results, return both. Best for: - Open-ended questions that need finding sources AND reading them ("what's new with X", "summarize the controversy around Y"). - Replacing a `search` + N x `fetch` chain with one call. - Producing a citable brief with [n]-style source references. Not recommended for: - You only need links -> `search` (cheaper, no fetching). - You only need to read one URL you already have -> `fetch`. - You want to query previously-fetched cached pages -> `cache_search`. Returns: - markdown (default): a "Research brief" with a Sources index then the full Markdown body of each fetched document, separated by horizontal rules; includes a token estimate. - json: {question, engines, sources:[{rank,title,url,snippet,...}], documents:[...], tokens_estimated, errors}. Common mistakes: - Using `depth=8` for a quick lookup — that's 8 page fetches; 2-3 is almost always enough. - Calling `research` for a known URL — that's `fetch` territory. - Forgetting that `fetch=False` returns sources only (much cheaper if the LLM only needs to pick which one to read). Args: question: What you want to know, in natural language. depth: How many top results to fetch (1-8). 3 is a good default. engines: Override the engine set (see `engines()` for names). fetch: If False, return source list without reading them. use_cache: Reuse cached search/page data within TTL. max_age_hours: Treat cached search results AND cached page bodies older than this as a read miss; fresh data is always written back. 0 = force-refresh both the engine search and every fetched page body; None = server default TTL (7 days). A non-zero value is honored for both halves (it used to be ignored for anything but 0). format: "markdown" or "json".
cache_searchA	Full-text search over pages already fetched into the local SQLite FTS5 index. Best for: - Recalling something the user/agent fetched earlier in the conversation ("what did that Wikipedia page say about X"). - Avoiding re-fetching content already in the local cache. - Quick keyword grep across the corpus you've built up. Not recommended for: - Discovering new pages on the open web -> use `search` or `research`. - When the cache is empty (fresh install) -> `search`/`research` first to populate it. Returns: - markdown (default): a per-hit list of title, URL, and a `[bracket]`- highlighted snippet around the matched terms. - json: list of {url, title, snippet}. Common mistakes: - Treating this like web search — it ONLY hits pages already in the local cache. If the user hasn't fetched anything, you'll get zero hits. - Using natural-language phrases without quoting them; FTS5 splits on whitespace as AND. For an exact phrase use `"like this"`. Args: query: FTS5 query. Bare terms = AND. Supports OR / NOT, prefix (`term*`), and phrase (`"exact phrase"`). limit: Max hits to return. format: "markdown" or "json".
enginesA	List engine names accepted by the `engines=` parameter of `search` / `research`. Best for: - Discovering what's installable before passing a non-default engine. - Building user-facing UIs that let humans pick engines. Not recommended for: - Calling on every search — the list is static; cache it. Returns: - The live, complete list of engine name strings. The buckets below are illustrative; always trust the returned list over this doc. Common mistakes: - Passing one of these names as a query to `search` — they go in the `engines=` argument, not `query`. - Passing a key-only engine (brave_api/serper/tavily/google_cse) with no key configured — it returns an actionable error, not results. Defaults: duckduckgo + mojeek + googlenews + bing (reliable, all-HTTP, low-latency; googlenews is an RSS index with structured publish dates and its URLs resolve to the real publisher on fetch/research; bing's www4 edge answers in ~0.3s). Keyless opt-in: google + serpsearch (Google SERP scrapers, HTTP-first), anysearch (JSON aggregator), startpage (browser-rendered, slower), brave (PoW captcha after a few calls), baidu (CN index), bilibili (CN video), zhihu (CN Q&A, often login-gated), searx (public-instance meta-search; set SEARCH_MCP_SEARX_INSTANCES if it returns nothing). Key-required (configure via admin UI / SEARCH_MCP_*_API_KEY): brave_api, serper, tavily, google_cse.
compareA	Fetch 2-5 URLs concurrently and return per-URL excerpts so the LLM can compare them against a single question in one round trip. Best for: - Side-by-side product/feature/article comparisons. - "Compare X to Y" or "How does A differ from B" queries. - Triangulating a fact across multiple sources. Not recommended for: - >5 URLs -> use `fetch_batch`. - 1 URL -> use `fetch`. - Don't have URLs yet -> use `search` or `research` first. Returns: - markdown (default): a comparison brief with per-URL sections, each containing title, sitename, published date, and a smart-truncated excerpt. - json: {question, urls, excerpts:[{url, title, excerpt, ...}], tokens_estimated}. Common mistakes: - Asking `compare` to actually answer the question — it returns material, the LLM does the comparison. - Passing >5 URLs and expecting them all to fit in context — use `fetch_batch` for bulk reads. Args: question: The comparison question the LLM will answer using the returned excerpts. urls: 2-5 absolute http(s) URLs. format: "markdown" (default) or "json".
extract_structuredA	Pull JSON-LD, OpenGraph, Twitter cards, and microdata from a web page. Best for: - Product pages (price, currency, availability, brand, rating). - Article pages (author, publish date, image, headline). - Recipe / event / video pages where rich metadata IS the answer. - Cases where `fetch` returns prose but you need fields. Not recommended for: - Just reading a page -> use `fetch`. - PDFs / DOCX -> use `read_doc`. - Pages that don't publish schema.org metadata (most blogs) — you'll get empty lists; fall back to `fetch`. Returns: - json: {url, json_ld:[], microdata:[], opengraph:[], rdfa:[]}. Twitter card meta tags are surfaced inside the `opengraph` list. - markdown (default): a flattened key/value view with each block printed as a JSON code block under its syntax heading. Common mistakes: - Calling on every URL "just in case" — most sites have no structured data, and `fetch` is what you actually want. Args: url: Absolute http(s) URL. format: "markdown" (default) or "json".

Prompts

Interactive templates invoked by user choice

Name	Description
`research_prompt`	Instruct the model to do a thorough, cited research pass on a question.
`factcheck_prompt`	Instruct the model to fact-check a specific claim with citations.
`compare_sources`	Instruct the model to use `compare` against several URLs and answer the question with per-URL citations.
`news_brief`	Instruct the model to produce a fresh news brief using `search` + `fetch_batch`, with citations.

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sweetcornna/free-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server