mcp-omnisearch

by spences10
Verified
# Unified MCP Search Tool Plan ## Architecture Overview ```mermaid graph TD A[MCP Omnisearch Server] --> B[Tool Registry] B --> S[Search Tools] B --> AI[AI Response Tools] B --> P[Content Processing Tools] B --> E[Enhancement Tools] S --> S1[Tavily Search] S --> S2[Brave Search] S --> S3[Kagi Search] AI --> AI1[Perplexity AI] AI --> AI2[Kagi FastGPT] P --> P1[Jina AI Reader] P --> P2[Kagi Universal Summarizer] P --> P3[Tavily Extract] P --> P4[Firecrawl Scrape] P --> P5[Firecrawl Crawl] P --> P6[Firecrawl Map] P --> P7[Firecrawl Extract] P --> P8[Firecrawl Actions] E --> E1[Kagi Enrichment API] E --> E2[Jina AI Grounding] S1 --> API1[Tavily API] S2 --> API2[Brave API] S3 --> API4[Kagi API] AI1 --> API3[Perplexity API] AI2 --> API4[Kagi API] P1 --> API5[Jina AI API] P2 --> API4 P3 --> API1 P4 --> API6[Firecrawl API] P5 --> API6 P6 --> API6 P7 --> API6 P8 --> API6 E1 --> API4 E2 --> API5 Config[API Keys & Config] --> A ``` ## Key Components 1. **Unified Server Interface** - Single MCP server exposing multiple search tools - Common parameter structure where possible - Provider-specific parameters where needed 2. **Tool Registry** - Registers all search providers with clear, detailed descriptions - Highlights strengths and best-use cases for each provider - Handles provider names with underscores by splitting from right - Tool names follow pattern: provider*name + "*" + action Example: "kagi_fastgpt_search" splits into: - provider_name: "kagi_fastgpt" - action: "search" 3. **Provider Implementation** - Each search provider implemented as a separate module - Shared utilities for common functionality - Consistent error handling across providers 4. **Configuration Management** - Environment variable-based API key management - Configurable defaults for each provider ## Tool Descriptions Strategy The key to making this work effectively is in the tool descriptions. Each tool will have a detailed description that explains: - What the search provider is best at - Types of queries it handles well - Unique features (like Jina AI's parsing capabilities) - Limitations or constraints Example Tool Descriptions: Search Tools: - **Tavily**: "Optimized for factual information with strong citation support" - **Brave**: "Privacy-focused with good coverage of technical topics" - **Kagi**: "High-quality search results with minimal advertising influence, focused on authoritative sources" AI Response Tools: - **Perplexity**: "AI-powered response generation combining real-time web search with advanced language models (GPT-4 Omni, Claude 3). Best for complex queries requiring reasoning and synthesis across multiple sources. Features contextual memory for follow-up questions." - **Kagi FastGPT**: "Quick AI-generated answers with citations, optimized for rapid response (900ms typical start time). Runs full search underneath for enriched answers." Content Processing Tools: - **Jina AI Reader**: "Converts any URL to clean, LLM-friendly text. Features automatic image captioning and native PDF support. Optimized for high-quality content extraction from complex web pages." - **Kagi Universal Summarizer**: "Instantly summarizes content of any type and length from URLs. Supports pages, videos, and podcasts with transcripts." - **Tavily Extract**: "Extract web page content from single or multiple URLs. Efficiently converts web content into clean, processable text with configurable extraction depth and optional image extraction." - **Firecrawl Scrape**: "Extract clean, LLM-ready data from single URLs with enhanced formatting options." - **Firecrawl Crawl**: "Deep crawling of all accessible subpages on a website with configurable depth limits." - **Firecrawl Map**: "Fast URL collection from websites for comprehensive site mapping." - **Firecrawl Extract**: "Structured data extraction with AI using natural language prompts." - **Firecrawl Actions**: "Support for page interactions (clicking, scrolling, etc.) before extraction for dynamic content." Enhancement Tools: - **Kagi Enrichment API**: "Provides supplementary content from specialized indexes (Teclis for web, TinyGem for news). Ideal for discovering non-mainstream results." - **Jina AI Grounding**: "Real-time fact verification against web knowledge. Reduces hallucinations and improves content integrity through statement verification." ## Implementation Plan 1. **Phase 1: Core Structure** - Set up the unified MCP server framework - Create modular structure for providers - Implement configuration management 2. **Phase 2: Provider Integration** - Integrate each search provider - Develop comprehensive tool descriptions - Implement error handling and fallbacks 3. **Phase 3: Testing & Refinement** - Test with various query types - Refine tool descriptions based on AI selection behavior - Add any missing provider-specific parameters ## Folder Structure ``` src/ โ”œโ”€โ”€ index.ts # Main server entry point โ”œโ”€โ”€ config/ # Configuration management โ”‚ โ””โ”€โ”€ env.ts # Environment variable handling โ”œโ”€โ”€ providers/ # All provider implementations โ”‚ โ”œโ”€โ”€ search/ # Search providers โ”‚ โ”‚ โ”œโ”€โ”€ tavily/ # Tavily implementation โ”‚ โ”‚ โ”œโ”€โ”€ brave/ # Brave implementation โ”‚ โ”‚ โ””โ”€โ”€ kagi/ # Kagi implementation โ”‚ โ”œโ”€โ”€ ai_response/ # AI response providers โ”‚ โ”‚ โ”œโ”€โ”€ perplexity/ # Perplexity implementation โ”‚ โ”‚ โ””โ”€โ”€ kagi_fastgpt/ # Kagi FastGPT implementation โ”‚ โ”œโ”€โ”€ processing/ # Content processing providers โ”‚ โ”‚ โ”œโ”€โ”€ jina_reader/ # Jina AI Reader implementation โ”‚ โ”‚ โ”œโ”€โ”€ kagi_summarizer/ # Kagi Universal Summarizer implementation โ”‚ โ”‚ โ”œโ”€โ”€ tavily_extract/ # Tavily Extract implementation โ”‚ โ”‚ โ””โ”€โ”€ firecrawl/ # Firecrawl tools โ”‚ โ”‚ โ”œโ”€โ”€ scrape/ # URL scraping implementation โ”‚ โ”‚ โ”œโ”€โ”€ crawl/ # Website crawling implementation โ”‚ โ”‚ โ”œโ”€โ”€ map/ # URL mapping implementation โ”‚ โ”‚ โ”œโ”€โ”€ extract/ # Structured data extraction โ”‚ โ”‚ โ””โ”€โ”€ actions/ # Page interaction implementation โ”‚ โ””โ”€โ”€ enhancement/ # Enhancement providers โ”‚ โ”œโ”€โ”€ kagi_enrichment/ # Kagi Enrichment implementation โ”‚ โ””โ”€โ”€ jina_grounding/ # Jina AI Grounding implementation โ”œโ”€โ”€ common/ # Shared utilities โ”‚ โ”œโ”€โ”€ types.ts # Common type definitions โ”‚ โ””โ”€โ”€ utils.ts # Shared helper functions โ””โ”€โ”€ server/ # Core server functionality โ”œโ”€โ”€ tools.ts # Tool registration logic โ””โ”€โ”€ handlers.ts # Request handlers ``` ## Consumer Tool Selection The consumer (AI) will have excellent guidance for tool selection through detailed provider descriptions that act as instructions. For example: ```typescript // Example tool registration with detailed description server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ { name: 'tavily_search', description: 'Search the web using Tavily Search API. Best for factual queries requiring reliable sources and citations. Provides high-quality results for technical, scientific, and academic topics. Use when you need verified information with strong citation support.', inputSchema: { // Schema details... }, }, { name: 'perplexity_search', description: 'Generate search results using Perplexity AI. Excels at complex questions requiring reasoning across multiple sources. Best for nuanced topics, emerging trends, and questions needing synthesis of information. Use for questions requiring deeper analysis rather than simple facts.', inputSchema: { // Schema details... }, }, ], })); ``` ## Best Practices 1. **Error Handling** - Implement consistent error handling across all providers - Provide clear error messages that help identify the source of issues - Include fallback mechanisms where appropriate 2. **Parameter Standardization** - Use consistent parameter names across providers - Standardize common parameters (e.g., always use `query` instead of mixing terms) - Document any provider-specific parameters clearly 3. **Logging and Monitoring** - Implement comprehensive logging for debugging - Track usage patterns and performance metrics - Monitor API rate limits and quotas 4. **Code Organization** - Keep provider implementations isolated - Share common utilities through the common/ directory - Maintain consistent coding style across all modules 5. **Documentation** - Provide clear documentation for each provider's capabilities - Include examples of ideal use cases - Document any provider-specific limitations or requirements ## Implementation Status & Next Steps ### Phase 1: Core Structure โœ… - โœ… Set up the unified MCP server framework - โœ… Create modular structure for providers - โœ… Implement configuration management - โœ… Set up resource handlers - โœ… Configure provider registration ### Phase 2: Provider Integration โœ… All providers have been successfully implemented: 1. **Search Providers** - [x] Tavily Search โœ… - โœ… Implement search API call with proper error handling - โœ… Add rate limiting with retry logic - โœ… Add comprehensive error handling with specific error messages - โœ… Successfully tested with real queries - [x] Brave Search โœ… - โœ… Implement search API call with proper error handling - โœ… Add rate limiting with retry logic - โœ… Add domain filtering support - โœ… Successfully tested implementation - [x] Kagi Search โœ… - โœ… Implement search API call with proper error handling - โœ… Add rate limiting with retry logic - โœ… Add domain filtering support - โœ… Successfully tested with real queries 2. **AI Response Providers** - [x] Perplexity AI โœ… - โœ… Implement chat completion API with proper error handling - โœ… Add context handling with system messages - โœ… Add comprehensive parameter support (top_p, top_k, penalties) - โœ… Implement search interface for unified access - โœ… Configure for sonar-pro model with online search - โœ… Successfully tested implementation - Note: Citations require elevated API access - [x] Kagi FastGPT โœ… - โœ… Implement FastGPT API with proper error handling - โœ… Add citation handling through references - โœ… Successfully tested with real queries - Note: Required special handling in ToolRegistry for provider names containing underscores 3. **Content Processing** - [x] Jina AI Reader โœ… - โœ… Implement URL processing with proper error handling - โœ… Add support for both JSON and text response formats - โœ… Successfully tested with real URLs - [x] Kagi Summarizer โœ… - โœ… Implement URL summarization with proper error handling - โœ… Add response parsing for output and metadata - โœ… Add retry logic with backoff - โœ… Successfully tested with real URLs - Note: Uses POST method with JSON body, returns summary in data.output - [x] Tavily Extract โœ… - โœ… Implement URL extraction with proper error handling - โœ… Add support for single and multiple URL processing - โœ… Add configurable extraction depth options - โœ… Successfully tested with real URLs - [x] Firecrawl Tools โœ… - [x] Scrape โœ… - โœ… Implement URL scraping with proper error handling - โœ… Add support for different output formats (markdown, text, HTML) - โœ… Add retry logic with backoff - โœ… Updated to use Bearer token authentication - โœ… Successfully tested with example.com - [x] Crawl โœ… - โœ… Implement website crawling with configurable depth - โœ… Add support for different output formats - โœ… Add comprehensive error handling - โœ… Implement rate limiting to avoid overloading target sites - โœ… Updated to use Bearer token authentication - โœ… Successfully tested with example.com - [x] Map โœ… - โœ… Implement URL mapping functionality - โœ… Add configurable depth options - โœ… Add filtering capabilities for URL patterns - โœ… Implement proper error handling - โœ… Updated to use Bearer token authentication - โœ… Successfully tested with example.com - [x] Extract โœ… - โœ… Implement structured data extraction with AI - โœ… Add support for custom extraction prompts - โœ… Add comprehensive error handling - โœ… Implement retry logic - โœ… Updated to use Bearer token authentication - โœ… Successfully tested with example.com - [x] Actions โœ… - โœ… Implement page interaction capabilities (click, scroll, input) - โœ… Add support for waiting between actions - โœ… Add comprehensive error handling - โœ… Implement proper timeout handling - โœ… Updated to use Bearer token authentication - โœ… Successfully tested with news.ycombinator.com 4. **Enhancement Tools** - [x] Kagi Enrichment โœ… - โœ… Implement content enrichment with Teclis and TinyGem indexes - โœ… Add specialized index support for web and news content - โœ… Add source tracking with titles and URLs - โœ… Successfully tested with real content - [x] Jina Grounding โœ… - โœ… Implement fact verification with g.jina.ai endpoint - โœ… Add confidence scoring via factuality score - โœ… Add source citation with URLs and key quotes - โœ… Successfully tested with real statements - โœ… Integrated with EnhancementProvider interface - โœ… Added comprehensive error handling ### Phase 3: Testing & Refinement (In Progress) Systematic testing of all providers with real-world queries: 1. **Search Providers** - โœ… Tavily Search: Successfully tested with Rust error handling query - Properly implements domain filtering (docs.rs, rust-lang.org) - Returns relevant results with confidence scores - Comprehensive error handling and rate limiting - Code verified: Implements retry logic and query sanitization - โœ… Brave Search: Successfully tested with TypeScript documentation query - Domain filtering works using site: syntax - Returns clean, focused technical documentation - Proper timeout handling and JSON validation - Code verified: Implements rate limiting and retry logic - โœ… Kagi Search: Successfully tested with quantum computing research query - Returns authoritative academic sources - Supports both include/exclude domain filtering - Implements API balance tracking - Code verified: Comprehensive error handling and timeout management 2. **AI Response Providers** - โœ… Perplexity Search: Successfully tested with complex technical comparison - Generated comprehensive analysis of Rust vs C++ memory safety - Demonstrated strong synthesis across multiple sources - Included academic citations - Code verified: Implements multiple models, parameter controls, context handling - โœ… Kagi FastGPT: Successfully tested with current events query - Quick response time with well-structured output - Clear citation system with numbered references - Effective source integration - Code verified: Implements caching, web search, reference handling 3. **Content Processing** - โœ… Jina AI Reader: Successfully tested with Tokio Mutex documentation - Cleanly extracted technical content while preserving code blocks - Maintained document structure and formatting - Included metadata (title, word count) - Code verified: Implements URL validation, rate limiting, retry logic - โœ… Kagi Summarizer: Successfully tested with Rust documentation - Generated accurate, concise summaries of technical content - Preserved key concepts and relationships - Handled error cases appropriately - Code verified: Implements timeout handling, API balance tracking, comprehensive error handling - โœ… Tavily Extract: Successfully tested with multiple URLs - Efficiently extracted content from multiple pages - Properly combined results with metadata - Handled failed extractions gracefully - Code verified: Implements proper error handling, timeout management - Firecrawl Tools: Successfully implemented and tested - โœ… All providers implemented following the ProcessingProvider interface pattern - โœ… Comprehensive error handling and retry logic implemented - โœ… Support for various output formats and configurations added - โœ… Integrated with the existing processing provider registry - โœ… Updated to use Bearer token authentication - โœ… Updated request parameters and response handling to match API documentation - โœ… All providers successfully tested with example.com and news.ycombinator.com 4. **Enhancement Tools** - โœ… Kagi Enrichment: Successfully tested with AI/software development content - Retrieved relevant content from web and news sources - Properly filtered results by topic relevance - Included source tracking with titles and URLs - Code verified: Implements parallel endpoint querying, content filtering, HTML cleanup - โœ… Jina Grounding: Successfully tested with Rust language statement - Accurately identified factual inaccuracies - Provided detailed reasoning with sources - Included factuality scoring and verdicts - Code verified: Implements reference validation, token tracking, comprehensive error handling Next Steps: 1. Monitor rate limits across all providers 2. Add comprehensive error logging 3. Update documentation with test results 4. Implement provider-specific optimizations based on test findings 5. Consider adding streaming support for Perplexity responses 6. Expand testing of Firecrawl tools with more complex websites and use cases ### Development Order 1. Start with Tavily Search as it has the most straightforward API โœ… 2. Follow with Kagi Search since it's used across multiple features โœ… 3. Implement Brave Search โœ… 4. Add Perplexity AI for advanced query handling โœ… 5. Implement Jina AI Reader and Grounding โœ… 6. Add remaining Kagi features (FastGPT, Summarizer, Enrichment) โœ… 7. Implement Firecrawl tools as Content Processing providers โœ… - โœ… Start with basic Scrape functionality - โœ… Add Crawl and Map capabilities - โœ… Implement Extract with AI functionality - โœ… Add Actions support for interactive page handling