Skip to main content
Glama
lzmd66

clean-search-mcp

by lzmd66

clean-search-mcp ๐Ÿงน

A lightweight MCP (Model Context Protocol) service that provides clean, spam-free search results for AI agents. Filters out content farms, SEO garbage, and low-quality sites before they reach your LLM.

Features

  • Three search engines โ€” Yandex + Bing + DuckDuckGo auto-fallback

  • 176K+ domain blocklist โ€” auto-updated from 25+ community sources, covers malware/scam/ads/content farms

  • Three-layer filtering โ€” domain blacklist โ†’ content rules โ†’ quality scoring

  • Content extraction โ€” full page text via trafilatura + selectolax

  • Result scoring โ€” 0-1 quality score (official docs 0.8 > tutorials 0.6 > garbage 0)

  • LRU cache โ€” search cache 6h, fetch cache 24h, auto-cleanup

  • User blacklist โ€” add domains on the fly, report bad results

  • Deep mode โ€” optional Playwright fallback for JS-heavy pages

  • No heavy dependencies โ€” pure HTTP, no browser required

Related MCP server: internet-context-mcp

Quick Start

pip install -r requirements.txt
python main.py

MCP Client Config

{
  "mcpServers": {
    "clean-search": {
      "command": "python",
      "args": ["/path/to/clean_search_mcp/main.py"]
    }
  }
}

Test Locally

python test_local.py "your search query" -n 5
python test_local.py "your query" -n 5 --no-content   # skip page content
python test_local.py "your query" --deep               # use Playwright fallback

API

clean_search(query, max_results=5, with_content=True, deep_mode=False)

Param

Default

Description

query

required

Search query

max_results

5

Results to return (max 10)

with_content

True

Include extracted page text

deep_mode

False

Use Playwright fallback for JS pages

Returns [{title, url, snippet, content, score}] sorted by quality.

add_user_blacklist(domain)

Add a domain to personal blocklist.

report_bad_result(url)

Report a low-quality URL (domain auto-blocked).

Configuration

Edit config.py to tune:

  • Search providers: enable/disable Yandex, Bing, DuckDuckGo

  • Blacklist sources: add or remove community blocklist URLs

  • Scoring weights: adjust domain authority, content quality bonuses

  • Caching: TTL, max files, cleanup interval

  • Proxy: set PROXY for HTTP/Playwright

Dependencies

mcp, httpx, selectolax, trafilatura, duckduckgo-search

All lightweight pip packages. Playwright is optional (deep mode only).

License

MIT

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

โ€“Maintainers
โ€“Response time
โ€“Release cycle
โ€“Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lzmd66/clean-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server