mcp-omnisearch
by spences10
Verified
# Unified MCP Search Tool Plan
## Architecture Overview
```mermaid
graph TD
A[MCP Omnisearch Server] --> B[Tool Registry]
B --> S[Search Tools]
B --> AI[AI Response Tools]
B --> P[Content Processing Tools]
B --> E[Enhancement Tools]
S --> S1[Tavily Search]
S --> S2[Brave Search]
S --> S3[Kagi Search]
AI --> AI1[Perplexity AI]
AI --> AI2[Kagi FastGPT]
P --> P1[Jina AI Reader]
P --> P2[Kagi Universal Summarizer]
E --> E1[Kagi Enrichment API]
E --> E2[Jina AI Grounding]
S1 --> API1[Tavily API]
S2 --> API2[Brave API]
S3 --> API4[Kagi API]
AI1 --> API3[Perplexity API]
AI2 --> API4[Kagi API]
P1 --> API5[Jina AI API]
P2 --> API4
E1 --> API4
E2 --> API5
Config[API Keys & Config] --> A
```
## Key Components
1. **Unified Server Interface**
- Single MCP server exposing multiple search tools
- Common parameter structure where possible
- Provider-specific parameters where needed
2. **Tool Registry**
- Registers all search providers with clear, detailed descriptions
- Highlights strengths and best-use cases for each provider
- Handles provider names with underscores by splitting from right
- Tool names follow pattern: provider*name + "*" + action Example:
"kagi_fastgpt_search" splits into:
- provider_name: "kagi_fastgpt"
- action: "search"
3. **Provider Implementation**
- Each search provider implemented as a separate module
- Shared utilities for common functionality
- Consistent error handling across providers
4. **Configuration Management**
- Environment variable-based API key management
- Configurable defaults for each provider
## Tool Descriptions Strategy
The key to making this work effectively is in the tool descriptions.
Each tool will have a detailed description that explains:
- What the search provider is best at
- Types of queries it handles well
- Unique features (like Jina AI's parsing capabilities)
- Limitations or constraints
Example Tool Descriptions:
Search Tools:
- **Tavily**: "Optimized for factual information with strong citation
support"
- **Brave**: "Privacy-focused with good coverage of technical topics"
- **Kagi**: "High-quality search results with minimal advertising
influence, focused on authoritative sources"
AI Response Tools:
- **Perplexity**: "AI-powered response generation combining real-time
web search with advanced language models (GPT-4 Omni, Claude 3).
Best for complex queries requiring reasoning and synthesis across
multiple sources. Features contextual memory for follow-up
questions."
- **Kagi FastGPT**: "Quick AI-generated answers with citations,
optimized for rapid response (900ms typical start time). Runs full
search underneath for enriched answers."
Content Processing Tools:
- **Jina AI Reader**: "Converts any URL to clean, LLM-friendly text.
Features automatic image captioning and native PDF support.
Optimized for high-quality content extraction from complex web
pages."
- **Kagi Universal Summarizer**: "Instantly summarizes content of any
type and length from URLs. Supports pages, videos, and podcasts with
transcripts."
Enhancement Tools:
- **Kagi Enrichment API**: "Provides supplementary content from
specialized indexes (Teclis for web, TinyGem for news). Ideal for
discovering non-mainstream results."
- **Jina AI Grounding**: "Real-time fact verification against web
knowledge. Reduces hallucinations and improves content integrity
through statement verification."
## Implementation Plan
1. **Phase 1: Core Structure**
- Set up the unified MCP server framework
- Create modular structure for providers
- Implement configuration management
2. **Phase 2: Provider Integration**
- Integrate each search provider
- Develop comprehensive tool descriptions
- Implement error handling and fallbacks
3. **Phase 3: Testing & Refinement**
- Test with various query types
- Refine tool descriptions based on AI selection behavior
- Add any missing provider-specific parameters
## Folder Structure
```
src/
āāā index.ts # Main server entry point
āāā config/ # Configuration management
ā āāā env.ts # Environment variable handling
āāā providers/ # All provider implementations
ā āāā search/ # Search providers
ā ā āāā tavily/ # Tavily implementation
ā ā āāā brave/ # Brave implementation
ā ā āāā kagi/ # Kagi implementation
ā āāā ai_response/ # AI response providers
ā ā āāā perplexity/ # Perplexity implementation
ā ā āāā kagi_fastgpt/ # Kagi FastGPT implementation
ā āāā processing/ # Content processing providers
ā ā āāā jina_reader/ # Jina AI Reader implementation
ā ā āāā kagi_summarizer/ # Kagi Universal Summarizer implementation
ā āāā enhancement/ # Enhancement providers
ā āāā kagi_enrichment/ # Kagi Enrichment implementation
ā āāā jina_grounding/ # Jina AI Grounding implementation
āāā common/ # Shared utilities
ā āāā types.ts # Common type definitions
ā āāā utils.ts # Shared helper functions
āāā server/ # Core server functionality
āāā tools.ts # Tool registration logic
āāā handlers.ts # Request handlers
```
## Consumer Tool Selection
The consumer (AI) will have excellent guidance for tool selection
through detailed provider descriptions that act as instructions. For
example:
```typescript
// Example tool registration with detailed description
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: 'tavily_search',
description:
'Search the web using Tavily Search API. Best for factual queries requiring reliable sources and citations. Provides high-quality results for technical, scientific, and academic topics. Use when you need verified information with strong citation support.',
inputSchema: {
// Schema details...
},
},
{
name: 'perplexity_search',
description:
'Generate search results using Perplexity AI. Excels at complex questions requiring reasoning across multiple sources. Best for nuanced topics, emerging trends, and questions needing synthesis of information. Use for questions requiring deeper analysis rather than simple facts.',
inputSchema: {
// Schema details...
},
},
],
}));
```
## Best Practices
1. **Error Handling**
- Implement consistent error handling across all providers
- Provide clear error messages that help identify the source of
issues
- Include fallback mechanisms where appropriate
2. **Parameter Standardization**
- Use consistent parameter names across providers
- Standardize common parameters (e.g., always use `query` instead
of mixing terms)
- Document any provider-specific parameters clearly
3. **Logging and Monitoring**
- Implement comprehensive logging for debugging
- Track usage patterns and performance metrics
- Monitor API rate limits and quotas
4. **Code Organization**
- Keep provider implementations isolated
- Share common utilities through the common/ directory
- Maintain consistent coding style across all modules
5. **Documentation**
- Provide clear documentation for each provider's capabilities
- Include examples of ideal use cases
- Document any provider-specific limitations or requirements
## Implementation Status & Next Steps
### Phase 1: Core Structure ā
- ā Set up the unified MCP server framework
- ā Create modular structure for providers
- ā Implement configuration management
- ā Set up resource handlers
- ā Configure provider registration
### Phase 2: Provider Integration ā
All providers have been successfully implemented:
1. **Search Providers**
- [x] Tavily Search ā
- ā Implement search API call with proper error handling
- ā Add rate limiting with retry logic
- ā Add comprehensive error handling with specific error messages
- ā Successfully tested with real queries
- [x] Brave Search ā
- ā Implement search API call with proper error handling
- ā Add rate limiting with retry logic
- ā Add domain filtering support
- ā Successfully tested implementation
- [x] Kagi Search ā
- ā Implement search API call with proper error handling
- ā Add rate limiting with retry logic
- ā Add domain filtering support
- ā Successfully tested with real queries
2. **AI Response Providers**
- [x] Perplexity AI ā
- ā Implement chat completion API with proper error handling
- ā Add context handling with system messages
- ā Add comprehensive parameter support (top_p, top_k, penalties)
- ā Implement search interface for unified access
- ā Configure for sonar-pro model with online search
- ā Successfully tested implementation
- Note: Citations require elevated API access
- [x] Kagi FastGPT ā
- ā Implement FastGPT API with proper error handling
- ā Add citation handling through references
- ā Successfully tested with real queries
- Note: Required special handling in ToolRegistry for provider
names containing underscores
3. **Content Processing**
- [x] Jina AI Reader ā
- ā Implement URL processing with proper error handling
- ā Add support for both JSON and text response formats
- ā Successfully tested with real URLs
- [x] Kagi Summarizer ā
- ā Implement URL summarization with proper error handling
- ā Add response parsing for output and metadata
- ā Add retry logic with backoff
- ā Successfully tested with real URLs
- Note: Uses POST method with JSON body, returns summary in
data.output
4. **Enhancement Tools**
- [x] Kagi Enrichment ā
- ā Implement content enrichment with Teclis and TinyGem indexes
- ā Add specialized index support for web and news content
- ā Add source tracking with titles and URLs
- ā Successfully tested with real content
- [x] Jina Grounding ā
- ā Implement fact verification with g.jina.ai endpoint
- ā Add confidence scoring via factuality score
- ā Add source citation with URLs and key quotes
- ā Successfully tested with real statements
- ā Integrated with EnhancementProvider interface
- ā Added comprehensive error handling
### Phase 3: Testing & Refinement (In Progress)
Systematic testing of all providers with real-world queries:
1. **Search Providers**
- ā Tavily Search: Successfully tested with Rust error handling
query
- Properly implements domain filtering (docs.rs, rust-lang.org)
- Returns relevant results with confidence scores
- Comprehensive error handling and rate limiting
- Code verified: Implements retry logic and query sanitization
- ā Brave Search: Successfully tested with TypeScript documentation
query
- Domain filtering works using site: syntax
- Returns clean, focused technical documentation
- Proper timeout handling and JSON validation
- Code verified: Implements rate limiting and retry logic
- ā Kagi Search: Successfully tested with quantum computing
research query
- Returns authoritative academic sources
- Supports both include/exclude domain filtering
- Implements API balance tracking
- Code verified: Comprehensive error handling and timeout
management
2. **AI Response Providers**
- ā Perplexity Search: Successfully tested with complex technical
comparison
- Generated comprehensive analysis of Rust vs C++ memory safety
- Demonstrated strong synthesis across multiple sources
- Included academic citations
- Code verified: Implements multiple models, parameter controls,
context handling
- ā Kagi FastGPT: Successfully tested with current events query
- Quick response time with well-structured output
- Clear citation system with numbered references
- Effective source integration
- Code verified: Implements caching, web search, reference
handling
3. **Content Processing**
- ā Jina AI Reader: Successfully tested with Tokio Mutex
documentation
- Cleanly extracted technical content while preserving code
blocks
- Maintained document structure and formatting
- Included metadata (title, word count)
- Code verified: Implements URL validation, rate limiting, retry
logic
- ā Kagi Summarizer: Successfully tested with Rust documentation
- Generated accurate, concise summaries of technical content
- Preserved key concepts and relationships
- Handled error cases appropriately
- Code verified: Implements timeout handling, API balance
tracking, comprehensive error handling
4. **Enhancement Tools**
- ā Kagi Enrichment: Successfully tested with AI/software
development content
- Retrieved relevant content from web and news sources
- Properly filtered results by topic relevance
- Included source tracking with titles and URLs
- Code verified: Implements parallel endpoint querying, content
filtering, HTML cleanup
- ā Jina Grounding: Successfully tested with Rust language
statement
- Accurately identified factual inaccuracies
- Provided detailed reasoning with sources
- Included factuality scoring and verdicts
- Code verified: Implements reference validation, token tracking,
comprehensive error handling
Next Steps:
1. Monitor rate limits across all providers
2. Add comprehensive error logging
3. Update documentation with test results
4. Implement provider-specific optimizations based on test findings
5. Consider adding streaming support for Perplexity responses
### Development Order
1. Start with Tavily Search as it has the most straightforward API
2. Follow with Kagi Search since it's used across multiple features
3. Implement Brave Search
4. Add Perplexity AI for advanced query handling
5. Implement Jina AI Reader and Grounding
6. Add remaining Kagi features (FastGPT, Summarizer, Enrichment)