Skip to main content
Glama

MCP Web Search Server

by undici77
README.md7.95 kB
# 📂 MCP Web Search Server **A privacy‑focused web, social media, and archive search server exposing tools via the Model Control Protocol (MCP) for controlled access to external search capabilities.** --- ## Table of Contents - [Features](#features) - [Installation & Quick Start](#installation--quick-start) - [Command‑Line Options](#command-line-options) - [Integration with LM Studio](#integration-with-lm-studio) - [MCP API Overview](#mcp-api-overview) - [`initialize`](#initialize) - [`tools/list`](#toolslist) - [`tools/call`](#toolscall) - [Available Tools](#available-tools) - [`web_search`](#web_search) - [`social_search`](#social_search) - [`archives_search`](#archives_search) - [`list_engines`](#list_engines) - [`list_archives_services`](#list_archives_services) - [`clear_cache`](#clear_cache) - [Security Features](#security-features) --- ## 🎯 Features - **Parallel search** across multiple privacy‑focused web engines. - **Social media lookup** for public content on major platforms. - **Archive retrieval** from Wayback Machine, archive.today, Google Cache and others. - **Dynamic listing** of supported engines and archive services. - **Result caching** with LRU eviction to speed up repeated queries. --- ## 📦 Installation & Quick Start ```bash # Clone the repository (if applicable) git clone https://github.com/undici77/MCPWebSearch.git cd MCPWebSearch # Run the startup script (adjust name if different) ./run.sh -d /path/to/working/directory ``` 1️⃣ **Create & activate** a Python virtual environment (`.venv`). 2️⃣ **Install** all required dependencies from `requirements.txt`. 3️⃣ **Launch** the MCP Search Server (`main.py`) which listens on stdin/stdout for JSON‑RPC messages. > 📌 Ensure the startup script is executable: `chmod +x run.sh` --- ## ⚙️ Command‑Line Options | Option | Description | |--------|-------------| | `-d`, `--directory` | Path to the working directory (default: current process dir). | *The server itself does not require additional CLI flags; all configuration is performed via JSON‑RPC.* --- ## 🤝 Integration with LM Studio Add an entry to your `mcp.json` so LM Studio can start the server automatically: ```json { "mcpServers": { "web-search": { "command": "/absolute/path/to/run.sh", "args": [ "-d", "/absolute/path/to/working/directory" ], "env": { "WORKING_DIR": "." } } } } ``` > 📌 Make the script executable (`chmod +x /absolute/path/to/run.sh`) and run `./run.sh` once to install the virtual environment before launching LM Studio. --- ## 📡 MCP API Overview All communication follows **JSON‑RPC 2.0** over stdin/stdout. ### `initialize` ```json { "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {} } ``` *Response*: protocol version (`2024-11-05`), server capabilities (tool enumeration) and basic server info (`name`, `version`). ### `tools/list` ```json { "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} } ``` *Response*: an array of tool definitions (name, description, input schema). ### `tools/call` ```json { "jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": { "name": "<tool_name>", "arguments": { … } } } ``` *Note*: The tool identifier key is **`name`**, not `tool`. --- ## 🛠️ Available Tools ### web_search **Search the web using multiple privacy‑focused engines in parallel.** | Name | Type | Required | Description | |------|------|----------|-------------| | `query` | string | ✅ | Search query (max 500 characters). | | `engine` | string | ❌ (default `all`) | Engine to use (`duckduckgo`, `brave`, `startpage`, `ecosia`, `mojeek`, `yandex` or `all`). | | `max_results` | integer | ❌ (default 20) | Max results per engine (1‑50). | **Example** ```json { "jsonrpc": "2.0", "id": 10, "method": "tools/call", "params": { "name": "web_search", "arguments": { "query": "privacy focused search engines", "engine": "duckduckgo", "max_results": 15 } } } ``` *The server returns a formatted text block containing titles, URLs and snippets from each selected engine.* --- ### social_search **Search public content on major social‑media platforms.** | Name | Type | Required | Description | |------|------|----------|-------------| | `query` | string | ✅ | Search query (max 500 characters). | | `platform` | string | ❌ (default `all`) | Platform to search (`twitter`, `reddit`, `youtube`, `github`, `stackoverflow`, `medium`, `pinterest`, `tiktok`, `instagram`, `facebook`, `linkedin` or `all`). | **Example** ```json { "jsonrpc": "2.0", "id": 11, "method": "tools/call", "params": { "name": "social_search", "arguments": { "query": "AI ethics research", "platform": "reddit" } } } ``` *The response contains direct URLs that can be opened in a browser.* --- ### archives_search **Find archived versions of a URL across multiple web‑archive services.** | Name | Type | Required | Description | |------|------|----------|-------------| | `url` | string | ✅ | Complete URL (must include `http://` or `https://`). | | `service` | string | ❌ (default `all`) | Archive service (`wayback`, `archive_today`, `google_cache`, `bing_cache`, `yandex_cache`, `cachedview`, `ghostarchive` or `all`). | | `check_availability` | boolean | ❌ (default false) | When true, the server queries the Wayback Machine API for snapshot statistics. | **Example** ```json { "jsonrpc": "2.0", "id": 12, "method": "tools/call", "params": { "name": "archives_search", "arguments": { "url": "https://example.com", "service": "wayback", "check_availability": true } } } ``` *The response lists archive URLs and, if requested, snapshot counts and timestamps.* --- ### list_engines **List all available privacy‑focused search engines.** | Name | Type | Required | Description | |------|------|----------|-------------| *(No parameters)* | — | — | — | **Example** ```json { "jsonrpc": "2.0", "id": 13, "method": "tools/call", "params": { "name": "list_engines", "arguments": {} } } ``` *The server returns a markdown‑formatted overview of each engine and usage notes.* --- ### list_archives_services **List all supported web‑archive services.** | Name | Type | Required | Description | |------|------|----------|-------------| *(No parameters)* | — | — | — | **Example** ```json { "jsonrpc": "2.0", "id": 14, "method": "tools/call", "params": { "name": "list_archives_services", "arguments": {} } } ``` *The response includes a description of each service, its ID and key features.* --- ### clear_cache **Clear the internal search‑result cache.** | Name | Type | Required | Description | |------|------|----------|-------------| *(No parameters)* | — | — | — | **Example** ```json { "jsonrpc": "2.0", "id": 15, "method": "tools/call", "params": { "name": "clear_cache", "arguments": {} } } ``` *The server replies with a confirmation message.* --- ## 🔐 Security Features - **Query sanitisation** – strips control characters, removes HTML tags and enforces `MAX_QUERY_LENGTH` (500). - **Strict URL validation** – accepts only `http://` or `https://` schemes with a valid domain. - **Blocked patterns** – regexes prevent `<script>` injection, `javascript:` URIs and event‑handler attributes. - **Input schema enforcement** – each tool validates required fields via the JSON‑RPC `inputSchema`. - **Rate limiting** – an asyncio semaphore caps concurrent external requests (`MAX_CONCURRENT_SEARCHES`). --- *© 2025 Undici77 – All rights reserved.*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/undici77/MCPWebSearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server