search_documentation

Search Apache Spark documentation using keyword queries with full-text search, stemming, and optional section filters to find relevant results quickly.

Instructions

Search Apache Spark documentation by keyword query.

Args: query: Search terms to find in the documentation. Supports full-text search with stemming (e.g., "stream" matches "streaming", "streams"). section: Optional section to filter results. Common sections include: 'sql-ref', 'api', 'streaming', 'mllib', 'graphx', 'structured-streaming', etc. limit: Maximum number of results to return (default: 10, max: 50).

Returns: JSON-formatted search results with title, URL, snippet, and relevance score.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`query`	Yes
`section`	No
`limit`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/mcp_spark_documentation/server.py:39-86 (handler)

Core implementation of the search_documentation tool. Calls DocumentDatabase.search() with the query, optional section filter, and limit, then formats results as JSON. This is the actual business logic invoked by the tool handler.

def _search_documentation_impl(
    query: str,
    section: str | None = None,
    limit: int = 10,
) -> str:
    """Core implementation of search_documentation.

    Args:
        query: Search terms to find in the documentation.
        section: Optional section to filter results.
        limit: Maximum number of results to return.

    Returns:
        JSON-formatted search results.
    """
    db = get_database()

    # Validate and cap limit
    limit = min(max(1, limit), 50)

    results = db.search(query, section=section, limit=limit)

    if not results:
        return json.dumps(
            {
                "message": f"No results found for query: '{query}'",
                "results": [],
            }
        )

    output = {
        "query": query,
        "section_filter": section,
        "result_count": len(results),
        "results": [
            {
                "title": r.title,
                "url": r.url,
                "path": r.path,
                "section": r.section,
                "snippet": r.snippet,
                "relevance_score": round(r.score, 4),
            }
            for r in results
        ],
    }

    return json.dumps(output, indent=2)

src/mcp_spark_documentation/server.py:122-142 (registration)

Registration of the search_documentation tool via the @mcp.tool() decorator. The function is the public-facing MCP tool entry point that delegates to _search_documentation_impl.

@mcp.tool()
def search_documentation(
    query: str,
    section: str | None = None,
    limit: int = 10,
) -> str:
    """Search Apache Spark documentation by keyword query.

    Args:
        query: Search terms to find in the documentation. Supports
               full-text search with stemming (e.g., "stream" matches
               "streaming", "streams").
        section: Optional section to filter results. Common sections include:
                 'sql-ref', 'api', 'streaming', 'mllib', 'graphx',
                 'structured-streaming', etc.
        limit: Maximum number of results to return (default: 10, max: 50).

    Returns:
        JSON-formatted search results with title, URL, snippet, and relevance score.
    """
    return _search_documentation_impl(query, section, limit)

src/mcp_spark_documentation/database.py:106-197 (helper)

The DocumentDatabase.search() method performs the actual FTS5 full-text search using SQLite. Accepts a query, optional section filter, and limit. Returns SearchResult objects with BM25 relevance scoring.

def search(self, query: str, section: str | None = None, limit: int = 10) -> list[SearchResult]:
    """Search documents using FTS5.

    Args:
        query: Search query string.
        section: Optional section filter.
        limit: Maximum number of results.

    Returns:
        List of SearchResult instances ordered by relevance.
    """
    with self._get_connection() as conn:
        # Build query with optional section filter
        sql = """
            SELECT
                d.path,
                d.title,
                d.url,
                d.section,
                snippet(documents_fts, 2, '<mark>', '</mark>', '...', 64) as snippet,
                bm25(documents_fts, 5.0, 2.0, 1.0) as score
            FROM documents_fts
            JOIN documents d ON documents_fts.rowid = d.id
            WHERE documents_fts MATCH ?
        """
        params: list[str | int] = [query]

        if section:
            sql += " AND d.section = ?"
            params.append(section)

        sql += " ORDER BY score LIMIT ?"
        params.append(limit)

        cursor = conn.execute(sql, params)
        results = []
        for row in cursor.fetchall():
            results.append(
                SearchResult(
                    path=row["path"],
                    title=row["title"],
                    url=row["url"],
                    section=row["section"],
                    snippet=row["snippet"],
                    score=abs(row["score"]),  # BM25 returns negative scores
                )
            )
        return results

def get_document(self, path: str) -> Document | None:
    """Retrieve a document by path.

    Args:
        path: Relative path to the document.

    Returns:
        Document instance or None if not found.
    """
    with self._get_connection() as conn:
        cursor = conn.execute(
            "SELECT * FROM documents WHERE path = ?",
            (path,),
        )
        row = cursor.fetchone()
        if row:
            return Document(
                path=row["path"],
                title=row["title"],
                description=row["description"],
                section=row["section"],
                content=row["content"],
                url=row["url"],
            )
        return None

def clear(self) -> None:
    """Clear all documents from the database."""
    with self._get_connection() as conn:
        conn.execute("DELETE FROM documents")
        conn.commit()

def get_document_count(self) -> int:
    """Return the total number of indexed documents.

    Returns:
        Count of documents in the database.
    """
    with self._get_connection() as conn:
        cursor = conn.execute("SELECT COUNT(*) FROM documents")
        result = cursor.fetchone()
        return int(result[0]) if result else 0

src/mcp_spark_documentation/models.py:27-36 (schema)
SearchResult dataclass used as the return type from the database search, containing path, title, url, snippet, score, and section fields.
```
@dataclass
class SearchResult:
    """Represents a search result."""

    path: str
    title: str
    url: str
    snippet: str
    score: float
    section: str
```

MCP Spark Documentation Server

search_documentation

Instructions

Input Schema

Output Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API