MCP Spark Documentation Server

USAGE.md•5 KiB

# Usage Guide This guide provides detailed instructions for using the MCP Spark Documentation Server. ## Installation ### Prerequisites - Python 3.12 or later - [uv](https://docs.astral.sh/uv/) package manager - Git - Docker (optional, for containerised deployment) ### Local Development Setup 1. Clone the repository: ```bash git clone https://github.com/martoc/mcp-spark-documentation.git cd mcp-spark-documentation ``` 2. Initialise the development environment: ```bash make init ``` 3. Build the documentation index: ```bash make index ``` ## Indexing Documentation ### Initial Indexing Index the Spark documentation from the master branch: ```bash uv run spark-docs-index index ``` ### Rebuilding the Index Clear the existing index and rebuild from scratch: ```bash uv run spark-docs-index index --rebuild ``` ### Indexing a Specific Branch Index documentation from a specific Git branch: ```bash uv run spark-docs-index index --branch branch-3.5 ``` ### Index Statistics View the number of indexed documents: ```bash uv run spark-docs-index stats ``` ## Running the MCP Server ### Local Development Run the server directly using uv: ```bash make run # or uv run mcp-spark-documentation ``` ### Using Docker Build and run the server in a Docker container: ```bash make docker-build make docker-run ``` ## MCP Client Configuration ### Claude Code Add to your project's `.mcp.json`: ```json { "mcpServers": { "spark-documentation": { "command": "uv", "args": ["run", "mcp-spark-documentation"], "cwd": "/path/to/mcp-spark-documentation" } } } ``` ### Claude Desktop Add to your Claude Desktop configuration: ```json { "mcpServers": { "spark-documentation": { "command": "docker", "args": ["run", "-i", "--rm", "mcp-spark-documentation"] } } } ``` ## Using the Tools ### Searching Documentation Search for topics in Spark documentation: ``` Search for "structured streaming watermarks" Search for "dataframe join" in section "sql-ref" Search for "machine learning pipeline" with limit 20 ``` Example response: ```json { "query": "structured streaming watermarks", "section_filter": null, "result_count": 5, "results": [ { "title": "Structured Streaming Programming Guide", "url": "https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html", "path": "structured-streaming-programming-guide.md", "section": "root", "snippet": "...watermarks allow Spark to automatically track...", "relevance_score": 12.5432 } ] } ``` ### Reading Documentation Retrieve the full content of a specific page: ``` Read documentation at path "structured-streaming-programming-guide.md" ``` Example response: ```json { "path": "structured-streaming-programming-guide.md", "title": "Structured Streaming Programming Guide", "description": "Overview of Structured Streaming in Apache Spark", "section": "root", "url": "https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html", "content": "# Structured Streaming Programming Guide\n\n..." } ``` ## Common Sections The Spark documentation is organised into several sections: - **sql-ref**: SQL reference and syntax - **api**: API documentation (Scala, Java, Python, R) - **streaming**: Spark Streaming (DStreams) - **structured-streaming**: Structured Streaming - **mllib**: Machine Learning Library - **graphx**: Graph processing - **configuration**: Configuration and tuning - **deployment**: Deployment guides Use these section names with the `section` parameter to filter search results. ## Development Workflow ### Code Quality Checks Run all code quality checks: ```bash make build ``` This runs: - Linter (ruff) - Type checker (mypy) - Tests with coverage (pytest) ### Individual Checks ```bash make lint # Run linter only make typecheck # Run type checker only make test # Run tests only make format # Format code ``` ### Updating Dependencies Update the lock file: ```bash make generate ``` ## Troubleshooting ### Index Build Fails If the index build fails, try: 1. Check your internet connection 2. Verify Git is installed and accessible 3. Try rebuilding with a different branch: ```bash uv run spark-docs-index index --rebuild --branch master ``` ### No Search Results If searches return no results: 1. Verify the index is built: ```bash uv run spark-docs-index stats ``` 2. Rebuild the index if necessary: ```bash uv run spark-docs-index index --rebuild ``` ### Database Location The default database location is `data/spark_docs.db`. To use a custom location: ```bash uv run spark-docs-index index --database /path/to/custom.db ``` ## Performance Considerations - **Initial indexing**: May take several minutes depending on network speed - **Sparse checkout**: Only the `docs/` directory is cloned, reducing download size - **Search performance**: FTS5 with BM25 ranking provides fast, relevant results - **Memory usage**: Minimal during operation; database is SQLite-based

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/martoc/mcp-spark-documentation'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

USAGE.md•5 KiB