Skip to main content
Glama
cedarsaam

Agent Search

by cedarsaam

๐Ÿ”Ž Agent Search

A self-hosted, MCP-native web-search backend for AI agents โ€” meta-search, clean extraction, RAG with citations, GitHub project selection, and a Tavily-compatible API. All free, all local.

License: MIT Python MCP Self-hosted No API key required

English ยท ็ฎ€ไฝ“ไธญๆ–‡


Why?

Built-in WebSearch / WebFetch give you links and snippets. Your agent still has to search โ†’ fetch โ†’ read โ†’ reconcile by hand, and the results are easily polluted by SEO blogs and inflated stars.

Agent Search turns "search primitives" into "search outcomes": aggregate many engines, rank with official-source priority, extract clean text, and answer with chunk-level citations โ€” exposed as one MCP server any agent (Claude Code, Codex, Cursor, โ€ฆ) can call by default. It also does the things the built-ins can't: typed GitHub search, first-party project comparison for tech selection, site mapping, and a Tavily-compatible endpoint.

Related MCP server: agentic-store-mcp

โœจ Features

  • Meta-search over 9 engines via SearXNG (Google/Bing/DDG/Brave/Wikipedia/GitHub/StackOverflow/Reddit/News) with URL dedup.

  • Smart local reranking โ€” boosts official docs / API / pricing / changelog pages, down-weights SEO content farms, multi-query expansion for doc & pricing intent.

  • Robust extraction โ€” trafilatura โ†’ Jina Reader โ†’ requests fallback chain, ratio-based noise cleaning (keeps tables/code/prices/dates), optional Crawl4AI for JS-heavy pages.

  • RAG with citations โ€” search โ†’ parallel multi-source fetch โ†’ LLM summary with [1][2] references and per-source excerpts (chunk-level evidence); bad body falls back to snippet.

  • GitHub, done right โ€” typed repos/code/issues/prs search via the gh CLI, returning license / last-commit / archived / forks for real evaluation, not just stars.

  • ๐Ÿ†• Tech-selection compare โ€” github_compare pulls first-party facts (gh api) + OpenSSF Scorecard health (via the free deps.dev API) and flags archived / stale / no-release / copyleft. Evidence, not verdicts.

  • Site mapping โ€” sitemap.xml first, page-link fallback, same-domain dedup.

  • Tavily-compatible API โ€” drop-in /tavily/search with stable include_raw_content.

  • Caching โ€” SQLite TTL cache; works offline against the cache.

๐ŸŽฌ Demo

Tech-selection comparison โ€” first-party facts + OpenSSF Scorecard health, never just stars:

repo                 stars   license       last commit   scorecard   flags
fastapi/fastapi      99669   MIT           2026-06-25     7.8        -
django/django        87997   BSD-3-Clause  2026-06-25     6.8        [no release]
encode/starlette     12432   BSD-3-Clause  2026-06-19     7.5        -

Search that prefers official docs (content farms down-ranked automatically):

$ agent-search "python asyncio tutorial"
[1] A Conceptual Overview of asyncio โ€” Python 3 docs   https://docs.python.org/3/howto/...
[3] asyncio โ€” Asynchronous I/O โ€” Python 3 docs         https://docs.python.org/3/library/asyncio.html
...

๐Ÿ†š How it compares

No single OSS project covers this niche โ€” most are end-user apps, single-capability tools, or higher-level orchestrators.

Project

Multi-engine

Extract (JS)

RAG + cites

GitHub typed

Site map

Native MCP

Tavily-compat

Firecrawl

โš ๏ธ single-src

โœ…โœ…

โœ…

โš ๏ธ

โœ…

โœ…

โŒ

Crawl4AI

โŒ

โœ…โœ…

โš ๏ธ

โŒ

โœ…

โœ…

โŒ

Perplexica

โœ…

โš ๏ธ

โœ…

โŒ

โŒ

โŒ

โŒ

GPT Researcher

โš ๏ธ

โœ…

โœ… report

โŒ

โŒ

โŒ

โŒ

SearXNG

โœ…โœ…

โŒ

โŒ

โŒ

โŒ

โŒ

โŒ

mcp-searxng

โœ…

โš ๏ธ

โŒ

โŒ

โŒ

โœ…

โŒ

Agent Search

โœ… 9

โš ๏ธ/โœ… opt

โœ… chunk

โœ…โœ…

โœ…

โœ… 6 tools

โœ… only one

๐Ÿ—๏ธ Architecture

flowchart TD
    A["Agent / MCP client"] -->|"web_search ยท web_ask ยท web_extract<br/>web_map ยท github_search ยท github_compare"| B["Agent Search<br/>FastAPI ยท MCP ยท CLI"]
    B --> C["SearXNG ยท 9 engines<br/>meta-search + rerank"]
    B --> D["trafilatura / Jina / requests<br/>(+ Crawl4AI) ยท clean extraction"]
    B --> E["LLM (OpenAI-compatible)<br/>RAG with citations"]
    B --> F["gh CLI<br/>typed GitHub search"]
    B --> G["deps.dev + OpenSSF Scorecard<br/>project selection"]

๐Ÿš€ Quickstart

1. Start SearXNG (and optional FlareSolverr):

cp .env.example .env          # then edit: SEARXNG_SECRET_KEY, (optional) LLM key
docker compose up -d searxng  # add `flaresolverr` only if you need anti-bot handling

2. Install the Python side:

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt              # core
pip install -r requirements-optional.txt     # optional: better extraction (trafilatura)

Or install the CLIs globally with pipx / uv (from a clone):

pipx install .        # โ†’ `agent-search`, `agent-search-mcp`, `agent-search-server`

3. Use it โ€” three ways:

# CLI
python search.py "python asyncio tutorial"
python search.py "Anthropic Claude API pricing" --answer

# HTTP API (binds 127.0.0.1 by default)
python server.py          # โ†’ http://127.0.0.1:8077/docs

# MCP (Claude Code / Cursor / Codex โ€ฆ)
cp .mcp.json.example .mcp.json   # set the absolute path to this repo

๐Ÿงฐ MCP tools

Tool

What it does

web_search

Meta-search, ranked results

web_ask

RAG answer with [n] citations + per-source excerpts

web_extract

Fetch a page โ†’ clean Markdown

web_map

Discover a site's links (sitemap-first)

github_search

Typed repos/code/issues/prs search

github_compare

First-party tech-selection comparison (facts + OpenSSF Scorecard)

๐Ÿ’ก Coverage depends on your SearXNG instance & region. The bundled config ships some China-friendly engines (e.g. Doubao), so an instance hosted in or tuned for mainland China tends to rank Chinese sources higher and some international/English sources lower (and vice-versa elsewhere). For the widest reach, have your agent run its native WebSearch/WebFetch in parallel and merge โ€” Agent Search for aggregation/RAG/GitHub, native search for extra reach. You can also add/remove engines in searxng/settings.yml.

โš ๏ธ Notes & limitations

  • web_ask (RAG) needs an OpenAI-compatible LLM key; everything else (search/extract/map/github) needs no API key.

  • Extraction does not render JS by default โ€” install the optional crawl4ai and use deep=True for JS-heavy pages.

  • Built for local / trusted use: the HTTP server binds 127.0.0.1 by default and extraction has an SSRF guard (blocks localhost / private / cloud-metadata IPs). Add auth + a reverse proxy before exposing it.

  • This is a personal project, maintained best-effort. Issues/PRs welcome but no SLA.

๐Ÿ™ Acknowledgements

Stands on the shoulders of: SearXNG ยท trafilatura ยท Jina Reader ยท Crawl4AI ยท FlareSolverr ยท OpenSSF Scorecard + deps.dev ยท GitHub CLI ยท FastAPI ยท the Model Context Protocol. RAG summaries via any OpenAI-compatible endpoint (e.g. DeepSeek).

๐Ÿ“„ License

MIT โ€” do whatever, no warranty. Agent Search orchestrates SearXNG as a separate service (it does not bundle or modify SearXNG's source), so its AGPL does not extend to this project.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

โ€“Maintainers
โ€“Response time
โ€“Release cycle
โ€“Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cedarsaam/agent-search'

If you have feedback or need assistance with the MCP directory API, please join our Discord server