Best Apache Spark MCP Servers
Apache Spark is an open-source unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
Why this server?
Provides tools for searching and retrieving Apache Spark documentation, enabling full-text keyword searches with section filtering and access to the full content of documentation pages.
AlicenseAqualityAmaintenanceProvides full-text search and retrieval tools for Apache Spark documentation using SQLite FTS5 with BM25 ranking. It enables AI assistants to efficiently search, filter by section, and read specific Spark documentation pages.Last updated2MITWhy this server?
Provides read-only access to Apache Spark data through SQL models, allowing for querying live data via natural language questions without requiring SQL knowledge. Tools include listing available tables, retrieving column information, and executing SQL SELECT queries against Spark.
Alicense-qualityDmaintenanceApache Spark MCP Server by CDataLast updatedMITWhy this server?
Offers information about Duyet's expertise with Apache Spark through CV resources and tools, enabling discussions about data engineering projects.
FlicenseBqualityCmaintenanceAn experimental Model Context Protocol server that enables AI assistants to access information about Duyet, including his CV, blog posts, and GitHub activity through natural language queries.Last updated82Why this server?
Utilizes Apache Spark for writing Parquet/ORC file formats to MinIO storage as part of data processing pipelines.
Alicense-qualityDmaintenanceMCP server with 32 tools for ETL ingestion, AI-generated data quality rules, AI transformations, vector search, and natural-language SQL. Works across Postgres, MongoDB, Kafka, S3/MinIO, HashiCorp Vault, and five vector stores (Qdrant, Weaviate, Milvus, Chroma, pgvector).Last updated10Why this server?
Provides searchable documentation for Apache Spark as part of the data engineering knowledge base.
Alicense-qualityDmaintenanceProvides AI assistants with searchable access to documentation from 170+ curated repositories and 1000+ popular GitHub projects across 20+ categories including trading, AI/ML, DevOps, and web development.Last updated2MITWhy this server?
Provides tools to optimize Apache Spark code, including automatic optimization of PySpark code and performance analysis with execution metrics.
Flicense-qualityDmaintenanceAn MCP server that optimizes Apache Spark code using Claude AI, providing intelligent code optimization suggestions and performance analysis.Last updated29Why this server?
Supports Apache Spark environments for job execution and data processing.
Flicense-qualityDmaintenanceEnables LLMs to interact with Hopsworks for platform management, feature store operations, model lifecycle, jobs, and integrations.Last updatedWhy this server?
Provides query optimization and data discovery capabilities for Apache Spark by exposing logical and physical query plans, catalog and table information to AI systems.
-license-qualityCmaintenanceA server implementation of MCP for Apache Spark that provides query plans and catalog information to AI systems for query optimization and data discovery.Last updated18