search_documentation
Search Apache Spark documentation using keyword queries with full-text search, stemming, and optional section filters to find relevant results quickly.
Instructions
Search Apache Spark documentation by keyword query.
Args: query: Search terms to find in the documentation. Supports full-text search with stemming (e.g., "stream" matches "streaming", "streams"). section: Optional section to filter results. Common sections include: 'sql-ref', 'api', 'streaming', 'mllib', 'graphx', 'structured-streaming', etc. limit: Maximum number of results to return (default: 10, max: 50).
Returns: JSON-formatted search results with title, URL, snippet, and relevance score.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| section | No | ||
| limit | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- Core implementation of the search_documentation tool. Calls DocumentDatabase.search() with the query, optional section filter, and limit, then formats results as JSON. This is the actual business logic invoked by the tool handler.
def _search_documentation_impl( query: str, section: str | None = None, limit: int = 10, ) -> str: """Core implementation of search_documentation. Args: query: Search terms to find in the documentation. section: Optional section to filter results. limit: Maximum number of results to return. Returns: JSON-formatted search results. """ db = get_database() # Validate and cap limit limit = min(max(1, limit), 50) results = db.search(query, section=section, limit=limit) if not results: return json.dumps( { "message": f"No results found for query: '{query}'", "results": [], } ) output = { "query": query, "section_filter": section, "result_count": len(results), "results": [ { "title": r.title, "url": r.url, "path": r.path, "section": r.section, "snippet": r.snippet, "relevance_score": round(r.score, 4), } for r in results ], } return json.dumps(output, indent=2) - src/mcp_spark_documentation/server.py:122-142 (registration)Registration of the search_documentation tool via the @mcp.tool() decorator. The function is the public-facing MCP tool entry point that delegates to _search_documentation_impl.
@mcp.tool() def search_documentation( query: str, section: str | None = None, limit: int = 10, ) -> str: """Search Apache Spark documentation by keyword query. Args: query: Search terms to find in the documentation. Supports full-text search with stemming (e.g., "stream" matches "streaming", "streams"). section: Optional section to filter results. Common sections include: 'sql-ref', 'api', 'streaming', 'mllib', 'graphx', 'structured-streaming', etc. limit: Maximum number of results to return (default: 10, max: 50). Returns: JSON-formatted search results with title, URL, snippet, and relevance score. """ return _search_documentation_impl(query, section, limit) - The DocumentDatabase.search() method performs the actual FTS5 full-text search using SQLite. Accepts a query, optional section filter, and limit. Returns SearchResult objects with BM25 relevance scoring.
def search(self, query: str, section: str | None = None, limit: int = 10) -> list[SearchResult]: """Search documents using FTS5. Args: query: Search query string. section: Optional section filter. limit: Maximum number of results. Returns: List of SearchResult instances ordered by relevance. """ with self._get_connection() as conn: # Build query with optional section filter sql = """ SELECT d.path, d.title, d.url, d.section, snippet(documents_fts, 2, '<mark>', '</mark>', '...', 64) as snippet, bm25(documents_fts, 5.0, 2.0, 1.0) as score FROM documents_fts JOIN documents d ON documents_fts.rowid = d.id WHERE documents_fts MATCH ? """ params: list[str | int] = [query] if section: sql += " AND d.section = ?" params.append(section) sql += " ORDER BY score LIMIT ?" params.append(limit) cursor = conn.execute(sql, params) results = [] for row in cursor.fetchall(): results.append( SearchResult( path=row["path"], title=row["title"], url=row["url"], section=row["section"], snippet=row["snippet"], score=abs(row["score"]), # BM25 returns negative scores ) ) return results def get_document(self, path: str) -> Document | None: """Retrieve a document by path. Args: path: Relative path to the document. Returns: Document instance or None if not found. """ with self._get_connection() as conn: cursor = conn.execute( "SELECT * FROM documents WHERE path = ?", (path,), ) row = cursor.fetchone() if row: return Document( path=row["path"], title=row["title"], description=row["description"], section=row["section"], content=row["content"], url=row["url"], ) return None def clear(self) -> None: """Clear all documents from the database.""" with self._get_connection() as conn: conn.execute("DELETE FROM documents") conn.commit() def get_document_count(self) -> int: """Return the total number of indexed documents. Returns: Count of documents in the database. """ with self._get_connection() as conn: cursor = conn.execute("SELECT COUNT(*) FROM documents") result = cursor.fetchone() return int(result[0]) if result else 0 - SearchResult dataclass used as the return type from the database search, containing path, title, url, snippet, score, and section fields.
@dataclass class SearchResult: """Represents a search result.""" path: str title: str url: str snippet: str score: float section: str