semantic_search

Search across documentation libraries using AI-powered semantic matching combined with keyword and metadata ranking for relevant results.

Instructions

Enhanced semantic search across one or more libraries with AI-powered relevance ranking.

Uses hybrid search combining:
- Vector embeddings for semantic similarity (50% weight)
- Keyword matching for precise results (30% weight)
- Source authority and metadata (20% weight)

Args:
    query: The search query.
    libraries: A single library or a list of libraries to search in.
    context: Optional context about your project or use case.
    version: Library version to search (e.g., "4.2", "stable", "latest"). Default: "latest"
    auto_detect_version: Automatically detect installed package version. Default: False
    use_vector_rerank: Enable vector-based semantic reranking for better relevance. Default: True

Returns:
    Enhanced search results with AI-powered relevance scores and metadata, ranked across all libraries.

Input Schema

TableJSON Schema

Name	Required	Default
`query`	Yes
`libraries`	Yes
`context`	No
`version`	No	latest
`auto_detect_version`	No
`use_vector_rerank`	No

Implementation Reference

src/documentation_search_enhanced/main.py:878-968 (handler)

Primary handler for the 'semantic_search' MCP tool. Decorated with @mcp.tool() for registration. Executes the core logic: performs parallel semantic searches across specified libraries using the smart_search helper, optionally applies vector reranking, sorts by relevance, and formats the response.

async def semantic_search(
    query: str,
    libraries: LibrariesParam,
    context: Optional[str] = None,
    version: str = "latest",
    auto_detect_version: bool = False,
    use_vector_rerank: bool = True,
):
    """
    Enhanced semantic search across one or more libraries with AI-powered relevance ranking.

    Uses hybrid search combining:
    - Vector embeddings for semantic similarity (50% weight)
    - Keyword matching for precise results (30% weight)
    - Source authority and metadata (20% weight)

    Args:
        query: The search query.
        libraries: A single library or a list of libraries to search in.
        context: Optional context about your project or use case.
        version: Library version to search (e.g., "4.2", "stable", "latest"). Default: "latest"
        auto_detect_version: Automatically detect installed package version. Default: False
        use_vector_rerank: Enable vector-based semantic reranking for better relevance. Default: True

    Returns:
        Enhanced search results with AI-powered relevance scores and metadata, ranked across all libraries.
    """
    from .reranker import get_reranker

    await enforce_rate_limit("semantic_search")

    if isinstance(libraries, str):
        libraries = [lib.strip() for lib in libraries.split(",") if lib.strip()]

    search_tasks = [
        smart_search.semantic_search(query, lib, context) for lib in libraries
    ]

    try:
        results_by_library = await asyncio.gather(*search_tasks, return_exceptions=True)

        all_results: List[SearchResult] = []
        for res_list in results_by_library:
            if not isinstance(res_list, Exception):
                all_results.extend(res_list)  # type: ignore

        # Apply vector-based reranking for better semantic relevance
        if use_vector_rerank and all_results:
            try:
                reranker = get_reranker()
                all_results = await reranker.rerank(
                    all_results, query, use_semantic=True
                )
            except ImportError:
                logger.warning(
                    "Vector search dependencies not installed. "
                    "Falling back to basic relevance sorting. "
                    "Install with: pip install documentation-search-enhanced[vector]"
                )
                all_results.sort(key=lambda r: r.relevance_score, reverse=True)
        else:
            # Fallback to basic relevance score sorting
            all_results.sort(key=lambda r: r.relevance_score, reverse=True)

        return {
            "query": query,
            "libraries_searched": libraries,
            "total_results": len(all_results),
            "vector_rerank_enabled": use_vector_rerank,
            "results": [
                {
                    "source_library": result.source_library,
                    "title": result.title,
                    "url": result.url,
                    "snippet": (
                        result.snippet[:300] + "..."
                        if len(result.snippet) > 300
                        else result.snippet
                    ),
                    "relevance_score": result.relevance_score,
                    "content_type": result.content_type,
                    "difficulty_level": result.difficulty_level,
                    "estimated_read_time": f"{result.estimated_read_time} min",
                    "has_code_examples": result.code_snippets_count > 0,
                }
                for result in all_results[:10]  # Top 10 combined results
            ],
        }
    except Exception as e:
        return {"error": f"Search failed: {str(e)}", "results": []}

src/documentation_search_enhanced/smart_search.py:11-24 (schema)

Dataclass schema defining the structure of SearchResult objects used in semantic_search pipeline for typed result handling.

@dataclass
class SearchResult:
    """Enhanced search result with relevance scoring"""

    source_library: str
    url: str
    title: str
    snippet: str
    relevance_score: float
    content_type: str  # "tutorial", "reference", "example", "guide"
    difficulty_level: str  # "beginner", "intermediate", "advanced"
    code_snippets_count: int
    estimated_read_time: int  # in minutes

src/documentation_search_enhanced/smart_search.py:50-74 (helper)

Key helper method in SmartSearch class implementing semantic query expansion, search execution via configured search_fn, result enhancement (scoring, classification, estimation), and initial ranking.

async def semantic_search(
    self, query: str, library: str, context: Optional[str] = None
) -> List[SearchResult]:
    """Perform semantic search with context awareness"""

    # Expand query with semantic understanding
    expanded_query = self.expand_query_semantically(query, library, context)

    # Search with expanded query
    base_query = f"site:{self.get_docs_url(library)} {expanded_query}"

    # Perform the actual search (using existing search infrastructure)
    raw_results = await self.perform_search(base_query)

    # Enhance and rank results
    enhanced_results = []
    for result in raw_results:
        enhanced_result = await self.enhance_search_result(result, query, library)
        enhanced_results.append(enhanced_result)

    # Sort by relevance score
    enhanced_results.sort(key=lambda x: x.relevance_score, reverse=True)

    return enhanced_results

src/documentation_search_enhanced/reranker.py:49-109 (helper)

Reranker used conditionally in handler for advanced hybrid re-ranking (vector embeddings 50%, keywords 30%, metadata 20%). Called via get_reranker() when use_vector_rerank=True.

async def rerank(
    self,
    results: List[SearchResult],
    query: str,
    use_semantic: bool = True,
) -> List[SearchResult]:
    """
    Rerank search results using hybrid scoring.

    Args:
        results: List of search results to rerank
        query: Original search query
        use_semantic: Whether to use semantic scoring (can be disabled for speed)

    Returns:
        Reranked list of search results
    """
    if not results:
        return results

    logger.debug(f"Reranking {len(results)} results for query: {query[:50]}...")

    # Calculate scores for each result
    scored_results = []
    for result in results:
        score = 0.0

        # 1. Semantic similarity score (if enabled)
        if use_semantic:
            semantic_score = await self._calculate_semantic_score(
                query, result.snippet + " " + result.title
            )
            score += semantic_score * self.semantic_weight
        else:
            # If semantic disabled, redistribute weight to keyword matching
            score += result.relevance_score * (
                self.semantic_weight + self.keyword_weight
            )

        # 2. Keyword matching score (use existing relevance_score)
        if not use_semantic:
            # Already included above
            pass
        else:
            score += result.relevance_score * self.keyword_weight

        # 3. Metadata scoring (authority, content quality indicators)
        metadata_score = self._calculate_metadata_score(result)
        score += metadata_score * self.metadata_weight

        # Store the hybrid score
        result.relevance_score = score
        scored_results.append(result)

    # Sort by hybrid score
    scored_results.sort(key=lambda r: r.relevance_score, reverse=True)

    logger.debug(
        f"Reranked results. Top score: {scored_results[0].relevance_score:.3f}"
    )
    return scored_results

src/documentation_search_enhanced/main.py:878-878 (registration)
MCP tool registration decorator applied to the semantic_search handler function.
```
async def semantic_search(
```

Documentation Search MCP Server

semantic_search

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API