Skip to main content
Glama
himly0302

Web Analysis MCP

by himly0302

web_search

Search the web using SearXNG, crawl pages with Creeper, and get LLM-summarized results to access current online information while avoiding token limits.

Instructions

使用 SearXNG 搜索网络内容,通过 Creeper 爬取网页,并返回经过 LLM 总结的结果。适用于需要获取最新网络信息的场景。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes搜索查询(不能为空或仅包含空格字符)
max_resultsNo最大结果数量 (1-20)
languageNo搜索语言zh
time_rangeNo时间范围
include_domainsNo只搜索这些域名(白名单)
exclude_domainsNo排除这些域名(黑名单)
save_to_fileNo是否将爬取的内容保存到本地文件(需要启用 SAVE_CONTENT_ENABLED 配置)

Implementation Reference

  • Main handler function that executes the web_search tool. Uses SearXNG for search, Creeper for web crawling, and LLM for summarization. Implements multi-stage filtering (rule-based + LLM), content crawling, and intelligent summarization (single or Map-Reduce based on content size).
    export async function executeWebSearch(
      input: WebSearchInput,
      searxngService: SearXNGService,
      creeperService: CreeperService,
      summarizerService: SummarizerService,
      config: {
        filter: any;
        filterLlm: any;
        mapReduce: any;
        maxContentLength: number;
      }
    ): Promise<string> {
      const startTime = Date.now();
    
      logger.info('Executing web_search', {
        query: input.query,
        max_results: input.max_results,
        language: input.language,
        time_range: input.time_range
      });
    
      try {
        // 1. SearXNG 搜索(多获取一些结果供过滤)
        const searchResults = await searxngService.search(input.query, {
          language: input.language,
          timeRange: input.time_range,
          maxResults: input.max_results * 2,  // 多取一倍供过滤
        });
    
        if (searchResults.length === 0) {
          return `未找到与 "${input.query}" 相关的搜索结果。`;
        }
    
        logger.info('SearXNG search completed', {
          initialResultCount: searchResults.length,
          requestedResults: input.max_results
        });
    
        // 2. 规则过滤
        const filter = new ResultFilter({
          maxResults: input.max_results,
          domainWhitelist: input.include_domains,
          domainBlacklist: input.exclude_domains,
          ...config.filter
        });
    
        const filteredResults = filter.filter(searchResults, input.query);
    
        logger.info('Rule filtering completed', {
          filteredResultCount: filteredResults.length,
          whitelistCount: input.include_domains?.length || 0,
          blacklistCount: input.exclude_domains?.length || 0
        });
    
        // 2.5. LLM 智能过滤(如果启用)
        let topic: TopicCategory = 'other';
        let finalResults = filteredResults;
    
        if (config.filterLlm.enabled && filteredResults.length > 0) {
          const filterLlmService = new FilterLlmService(config.filterLlm);
    
          const { filtered: llmFiltered, topic: detectedTopic } = await filter.filterWithLlm(
            filteredResults,
            input.query,
            (query, results) => filterLlmService.filter(query, results)
          );
    
          topic = detectedTopic;
          finalResults = llmFiltered;
    
          logger.info('LLM filtering completed', {
            topic,
            keptCount: finalResults.length,
            filteredCount: filteredResults.length - finalResults.length
          });
        }
    
        if (finalResults.length === 0) {
          return `搜索结果已被过滤器全部排除。请尝试调整域名过滤设置。`;
        }
    
        // 3. Creeper 爬取网页内容
        const urls = finalResults.map(r => r.url);
        const crawledPages = await creeperService.crawl(urls, input.save_to_file, input.query);
    
        // 统计成功/失败
        const successPages = crawledPages.filter(p => p.success);
        const failedPages = crawledPages.filter(p => !p.success);
    
        logger.info('Crawling completed', {
          totalUrls: urls.length,
          successCount: successPages.length,
          failureCount: failedPages.length
        });
    
        if (successPages.length === 0) {
          return `所有网页爬取失败。请检查 Creeper 配置或网页可访问性。`;
        }
    
        // 4. 计算内容总大小
        const totalLength = successPages.reduce((sum, p) => sum + p.content.length, 0);
    
        logger.info('Content size analysis', {
          totalLength,
          avgLength: Math.round(totalLength / successPages.length),
          threshold: config.mapReduce.threshold
        });
    
        // 5. 选择总结策略(主题驱动)
        let summary: string;
        logger.info('Using topic-driven summarization', { topic });
    
        if (totalLength < config.mapReduce.threshold) {
          // 单次总结
          logger.info('Using single summarization');
          const combinedContent = formatForSingleSummary(successPages);
          const summaryPrompt = getSearchSummaryPromptByTopic(topic);
          const summaryResponse = await summarizerService.summarize({
            content: combinedContent,
            prompt: summaryPrompt
          });
          summary = summaryResponse.summary;
        } else {
          // Map-Reduce 总结
          logger.info('Using Map-Reduce summarization');
          summary = await mapReduceSummarize(
            successPages,
            summarizerService,
            {
              chunkSize: config.mapReduce.chunkSize,
              maxConcurrency: config.mapReduce.maxConcurrency
            }
          );
        }
    
        // 6. 格式化输出
        const output = formatOutput(summary, successPages, failedPages);
    
        const duration = Date.now() - startTime;
        logger.info('Web search completed', {
          query: input.query,
          duration: `${duration}ms`,
          topic,
          initialResults: searchResults.length,
          filteredResults: filteredResults.length,
          finalResults: finalResults.length,
          pagesCrawled: successPages.length
        });
    
        return output;
    
      } catch (error) {
        const errorMessage = error instanceof Error ? error.message : 'Unknown error';
        logger.error('Web search failed', {
          query: input.query,
          error: errorMessage
        });
    
        return `搜索执行失败: ${errorMessage}`;
      }
    }
  • Input validation schema using Zod. Defines and validates parameters: query (required string), max_results (1-20, default 10), language, time_range, include/exclude domains, and save_to_file option.
    export const webSearchInputSchema = z.object({
      query: z.string()
        .min(1, '搜索查询不能为空')
        .transform(val => val.trim())
        .refine(val => val.length > 0, '搜索查询不能只包含空格字符')
        .describe('搜索查询'),
      max_results: z.number().min(1).max(20).default(10).describe('最大结果数量'),
      language: z.string().optional().default('zh').describe('搜索语言'),
      time_range: z.enum(['day', 'week', 'month', 'year']).optional().describe('时间范围'),
      include_domains: z.array(z.string()).optional().describe('只搜索这些域名'),
      exclude_domains: z.array(z.string()).optional().describe('排除这些域名'),
      save_to_file: z.boolean().default(false).describe('是否将爬取的内容保存到本地文件'),
    });
  • src/server.ts:86-132 (registration)
    Tool registration in MCP server's ListToolsRequestSchema handler. Defines the web_search tool with name, description, and complete JSON Schema input specification including all parameters and validation constraints.
    {
      name: 'web_search',
      description:
        '使用 SearXNG 搜索网络内容,通过 Creeper 爬取网页,并返回经过 LLM 总结的结果。适用于需要获取最新网络信息的场景。',
      inputSchema: {
        type: 'object',
        properties: {
          query: {
            type: 'string',
            description: '搜索查询(不能为空或仅包含空格字符)',
          },
          max_results: {
            type: 'number',
            description: '最大结果数量 (1-20)',
            default: 10,
            minimum: 1,
            maximum: 20,
          },
          language: {
            type: 'string',
            description: '搜索语言',
            default: 'zh',
          },
          time_range: {
            type: 'string',
            enum: ['day', 'week', 'month', 'year'],
            description: '时间范围',
          },
          include_domains: {
            type: 'array',
            items: { type: 'string' },
            description: '只搜索这些域名(白名单)',
          },
          exclude_domains: {
            type: 'array',
            items: { type: 'string' },
            description: '排除这些域名(黑名单)',
          },
          save_to_file: {
            type: 'boolean',
            description: '是否将爬取的内容保存到本地文件(需要启用 SAVE_CONTENT_ENABLED 配置)',
            default: false,
          },
        },
        required: ['query'],
      },
    },
  • src/server.ts:148-163 (registration)
    Tool execution handler in CallToolRequestSchema. Routes 'web_search' requests to executeWebSearch function with validated input and configured services (SearXNG, Creeper, Summarizer).
    case 'web_search': {
      const input = webSearchInputSchema.parse(args);
      result = await executeWebSearch(
        input,
        searxngService,
        creeperService,
        summarizerService,
        {
          filter: config.filter,
          filterLlm: config.filterLlm,
          mapReduce: config.mapReduce,
          maxContentLength: config.service.maxContentLength,
        }
      );
      break;
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/himly0302/web-analysis-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server