web_search
Search the web using SearXNG, crawl pages with Creeper, and get LLM-summarized results to access current online information while avoiding token limits.
Instructions
使用 SearXNG 搜索网络内容,通过 Creeper 爬取网页,并返回经过 LLM 总结的结果。适用于需要获取最新网络信息的场景。
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | 搜索查询(不能为空或仅包含空格字符) | |
| max_results | No | 最大结果数量 (1-20) | |
| language | No | 搜索语言 | zh |
| time_range | No | 时间范围 | |
| include_domains | No | 只搜索这些域名(白名单) | |
| exclude_domains | No | 排除这些域名(黑名单) | |
| save_to_file | No | 是否将爬取的内容保存到本地文件(需要启用 SAVE_CONTENT_ENABLED 配置) |
Implementation Reference
- src/tools/web-search.ts:34-194 (handler)Main handler function that executes the web_search tool. Uses SearXNG for search, Creeper for web crawling, and LLM for summarization. Implements multi-stage filtering (rule-based + LLM), content crawling, and intelligent summarization (single or Map-Reduce based on content size).
export async function executeWebSearch( input: WebSearchInput, searxngService: SearXNGService, creeperService: CreeperService, summarizerService: SummarizerService, config: { filter: any; filterLlm: any; mapReduce: any; maxContentLength: number; } ): Promise<string> { const startTime = Date.now(); logger.info('Executing web_search', { query: input.query, max_results: input.max_results, language: input.language, time_range: input.time_range }); try { // 1. SearXNG 搜索(多获取一些结果供过滤) const searchResults = await searxngService.search(input.query, { language: input.language, timeRange: input.time_range, maxResults: input.max_results * 2, // 多取一倍供过滤 }); if (searchResults.length === 0) { return `未找到与 "${input.query}" 相关的搜索结果。`; } logger.info('SearXNG search completed', { initialResultCount: searchResults.length, requestedResults: input.max_results }); // 2. 规则过滤 const filter = new ResultFilter({ maxResults: input.max_results, domainWhitelist: input.include_domains, domainBlacklist: input.exclude_domains, ...config.filter }); const filteredResults = filter.filter(searchResults, input.query); logger.info('Rule filtering completed', { filteredResultCount: filteredResults.length, whitelistCount: input.include_domains?.length || 0, blacklistCount: input.exclude_domains?.length || 0 }); // 2.5. LLM 智能过滤(如果启用) let topic: TopicCategory = 'other'; let finalResults = filteredResults; if (config.filterLlm.enabled && filteredResults.length > 0) { const filterLlmService = new FilterLlmService(config.filterLlm); const { filtered: llmFiltered, topic: detectedTopic } = await filter.filterWithLlm( filteredResults, input.query, (query, results) => filterLlmService.filter(query, results) ); topic = detectedTopic; finalResults = llmFiltered; logger.info('LLM filtering completed', { topic, keptCount: finalResults.length, filteredCount: filteredResults.length - finalResults.length }); } if (finalResults.length === 0) { return `搜索结果已被过滤器全部排除。请尝试调整域名过滤设置。`; } // 3. Creeper 爬取网页内容 const urls = finalResults.map(r => r.url); const crawledPages = await creeperService.crawl(urls, input.save_to_file, input.query); // 统计成功/失败 const successPages = crawledPages.filter(p => p.success); const failedPages = crawledPages.filter(p => !p.success); logger.info('Crawling completed', { totalUrls: urls.length, successCount: successPages.length, failureCount: failedPages.length }); if (successPages.length === 0) { return `所有网页爬取失败。请检查 Creeper 配置或网页可访问性。`; } // 4. 计算内容总大小 const totalLength = successPages.reduce((sum, p) => sum + p.content.length, 0); logger.info('Content size analysis', { totalLength, avgLength: Math.round(totalLength / successPages.length), threshold: config.mapReduce.threshold }); // 5. 选择总结策略(主题驱动) let summary: string; logger.info('Using topic-driven summarization', { topic }); if (totalLength < config.mapReduce.threshold) { // 单次总结 logger.info('Using single summarization'); const combinedContent = formatForSingleSummary(successPages); const summaryPrompt = getSearchSummaryPromptByTopic(topic); const summaryResponse = await summarizerService.summarize({ content: combinedContent, prompt: summaryPrompt }); summary = summaryResponse.summary; } else { // Map-Reduce 总结 logger.info('Using Map-Reduce summarization'); summary = await mapReduceSummarize( successPages, summarizerService, { chunkSize: config.mapReduce.chunkSize, maxConcurrency: config.mapReduce.maxConcurrency } ); } // 6. 格式化输出 const output = formatOutput(summary, successPages, failedPages); const duration = Date.now() - startTime; logger.info('Web search completed', { query: input.query, duration: `${duration}ms`, topic, initialResults: searchResults.length, filteredResults: filteredResults.length, finalResults: finalResults.length, pagesCrawled: successPages.length }); return output; } catch (error) { const errorMessage = error instanceof Error ? error.message : 'Unknown error'; logger.error('Web search failed', { query: input.query, error: errorMessage }); return `搜索执行失败: ${errorMessage}`; } } - src/tools/web-search.ts:16-28 (schema)Input validation schema using Zod. Defines and validates parameters: query (required string), max_results (1-20, default 10), language, time_range, include/exclude domains, and save_to_file option.
export const webSearchInputSchema = z.object({ query: z.string() .min(1, '搜索查询不能为空') .transform(val => val.trim()) .refine(val => val.length > 0, '搜索查询不能只包含空格字符') .describe('搜索查询'), max_results: z.number().min(1).max(20).default(10).describe('最大结果数量'), language: z.string().optional().default('zh').describe('搜索语言'), time_range: z.enum(['day', 'week', 'month', 'year']).optional().describe('时间范围'), include_domains: z.array(z.string()).optional().describe('只搜索这些域名'), exclude_domains: z.array(z.string()).optional().describe('排除这些域名'), save_to_file: z.boolean().default(false).describe('是否将爬取的内容保存到本地文件'), }); - src/server.ts:86-132 (registration)Tool registration in MCP server's ListToolsRequestSchema handler. Defines the web_search tool with name, description, and complete JSON Schema input specification including all parameters and validation constraints.
{ name: 'web_search', description: '使用 SearXNG 搜索网络内容,通过 Creeper 爬取网页,并返回经过 LLM 总结的结果。适用于需要获取最新网络信息的场景。', inputSchema: { type: 'object', properties: { query: { type: 'string', description: '搜索查询(不能为空或仅包含空格字符)', }, max_results: { type: 'number', description: '最大结果数量 (1-20)', default: 10, minimum: 1, maximum: 20, }, language: { type: 'string', description: '搜索语言', default: 'zh', }, time_range: { type: 'string', enum: ['day', 'week', 'month', 'year'], description: '时间范围', }, include_domains: { type: 'array', items: { type: 'string' }, description: '只搜索这些域名(白名单)', }, exclude_domains: { type: 'array', items: { type: 'string' }, description: '排除这些域名(黑名单)', }, save_to_file: { type: 'boolean', description: '是否将爬取的内容保存到本地文件(需要启用 SAVE_CONTENT_ENABLED 配置)', default: false, }, }, required: ['query'], }, }, - src/server.ts:148-163 (registration)Tool execution handler in CallToolRequestSchema. Routes 'web_search' requests to executeWebSearch function with validated input and configured services (SearXNG, Creeper, Summarizer).
case 'web_search': { const input = webSearchInputSchema.parse(args); result = await executeWebSearch( input, searxngService, creeperService, summarizerService, { filter: config.filter, filterLlm: config.filterLlm, mapReduce: config.mapReduce, maxContentLength: config.service.maxContentLength, } ); break; }