Skip to main content
Glama
martoc

MCP Spark Documentation Server

by martoc

License: MIT Python 3.12 MCP

MCP Spark Documentation Server

An MCP (Model Context Protocol) server that provides search and retrieval tools for Apache Spark documentation. This server enables AI assistants like Claude to search and read Spark documentation directly.

Features

  • Full-text search using SQLite FTS5 with BM25 ranking and Porter stemming

  • Section filtering to narrow search results by documentation category

  • Sparse checkout for efficient cloning of only the docs directory from apache/spark

  • Docker support for portable deployment across projects

  • STDIO transport for seamless MCP client integration

Quick Start

# Build the Docker image (includes pre-indexed documentation)
make docker-build

# Test the server
make docker-run

Using uv (Local Development)

# Initialise the environment
make init

# Build the documentation index
make index

# Run the server
make run

Configuration

Claude Code / Claude Desktop

Add to your .mcp.json or global settings:

{
  "mcpServers": {
    "spark-documentation": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "martoc/mcp-spark-documentation:latest"]
    }
  }
}

For a locally built Docker image:

{
  "mcpServers": {
    "spark-documentation": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "mcp-spark-documentation"]
    }
  }
}

For local development without Docker:

{
  "mcpServers": {
    "spark-documentation": {
      "command": "uv",
      "args": ["run", "mcp-spark-documentation"],
      "cwd": "/path/to/mcp-spark-documentation"
    }
  }
}

MCP Tools

Tool

Description

search_documentation

Search Spark documentation by keyword query with optional section filtering

read_documentation

Retrieve the full content of a specific documentation page

search_documentation

Search Apache Spark documentation using full-text search with stemming support.

Parameter

Type

Required

Default

Description

query

string

Yes

-

Search terms (supports stemming)

section

string

No

None

Filter by section (e.g., sql-ref, streaming, mllib)

limit

integer

No

10

Maximum results (1-50)

Common Sections: sql-ref, api, streaming, mllib, graphx, structured-streaming, configuration, tuning

read_documentation

Retrieve the full content of a documentation page.

Parameter

Type

Required

Description

path

string

Yes

Relative path to document (from search results)

CLI Commands

# Build/rebuild the documentation index
uv run spark-docs-index index
uv run spark-docs-index index --rebuild
uv run spark-docs-index index --branch master

# Show index statistics
uv run spark-docs-index stats

Development

make init       # Initialise development environment
make build      # Run full build (lint, typecheck, test)
make test       # Run tests with coverage
make format     # Format code
make lint       # Run linter
make typecheck  # Run type checker

Documentation

Licence

This project is licensed under the MIT Licence - see the LICENSE file for details.

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Resources

Looking for Admin?

Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/martoc/mcp-spark-documentation'

If you have feedback or need assistance with the MCP directory API, please join our Discord server