Supports containerized deployment of the StreamSets MCP server using Docker with environment variable configuration
Built using Python 3.8+ as the runtime environment for the MCP server implementation
Includes Swagger API specifications for documenting the StreamSets Control Hub API endpoints
Uses YAML configuration files for MCP server registry settings and tool metadata
StreamSets MCP Server
A comprehensive Model Context Protocol (MCP) server that provides seamless integration with StreamSets Control Hub APIs, enabling complete data pipeline management and creation through conversational AI.
๐ Features
Pipeline Management (Read Operations)
Job Management: List, start, stop, and monitor job execution
Pipeline Operations: Browse, search, and analyze pipeline configurations
Connection Management: Manage data connections and integrations
Metrics & Analytics: Comprehensive performance and usage analytics
Enterprise Integration: Deployment management, security audits, and alerts
Pipeline Building (Write Operations) ๐
Interactive Pipeline Creation: Build pipelines through conversation
Stage Library: Access to 25+ StreamSets stages (Origins, Processors, Destinations, Executors)
Visual Flow Management: Connect stages with data and event streams
Persistent Sessions: Pipeline builders persist across conversations
Smart Validation: Automatic validation of pipeline logic and connections
๐ API Coverage
44 Tools covering 9 StreamSets Services:
Job Runner API (11 tools) - Job lifecycle management
Pipeline Repository API (7 tools) - Pipeline CRUD operations
Connection API (4 tools) - Data connection management
Provisioning API (5 tools) - Infrastructure and deployment
Notification API (2 tools) - Alert and notification management
Topology API (1 tool) - System topology information
Metrics APIs (7 tools) - Performance and usage analytics
Security API (1 tool) - Security audit trails
Pipeline Builder (6 tools) - Interactive pipeline creation
๐๏ธ Pipeline Builder Capabilities
Create Complete Data Pipelines
Persistent Pipeline Sessions
Cross-Conversation: Continue building pipelines across multiple conversations
Auto-Save: All changes automatically saved to disk
Session Management: List, view, and delete pipeline builder sessions
Storage Location:
~/.streamsets_mcp/pipeline_builders/
๐ ๏ธ Installation
Prerequisites
Python 3.8+
StreamSets Control Hub account with API credentials
Claude Desktop (for MCP integration)
Setup
Clone the repository
git clone https://github.com/yourusername/streamsets-mcp-server.git cd streamsets-mcp-serverInstall dependencies
pip install -r requirements.txtConfigure environment variables
export STREAMSETS_HOST_PREFIX="https://your-instance.streamsets.com" export STREAMSETS_CRED_ID="your-credential-id" export STREAMSETS_CRED_TOKEN="your-auth-token"Test the server
python streamsets_server.py
Docker Deployment
Setup for MCP Integration
Manual Testing
Claude Desktop Integration
Option 1: Direct Python (Local Development)
Option 2: Docker with Persistence (Production)
๐ Usage Examples
Job Management
Pipeline Operations
Metrics & Analytics
๐ง Configuration
Environment Variables
Required (StreamSets Authentication)
STREAMSETS_HOST_PREFIX
- StreamSets Control Hub URLSTREAMSETS_CRED_ID
- API Credential IDSTREAMSETS_CRED_TOKEN
- Authentication Token
Optional (Pipeline Builder Persistence)
PIPELINE_STORAGE_PATH
- Custom storage directory for pipeline builders
Pipeline Builder Storage
Pipeline builders are automatically persisted across conversations and container restarts:
Storage Locations (Priority Order)
Custom Path:
PIPELINE_STORAGE_PATH
environment variableDocker Volume:
/data/pipeline_builders
(when running in Docker)Default Path:
~/.streamsets_mcp/pipeline_builders/
Configuration Options
Format: Pickle files for session persistence
Management: Automatic file management with error handling
Fallback: Memory-only mode if no writable storage available
Docker Persistence
When using Docker, pipeline builders persist in named volumes:
Troubleshooting
No Persistence: Check storage directory permissions
Docker Issues: Ensure volume mounts are configured correctly
Memory Mode: Server logs will indicate if persistence is disabled
๐ Documentation
API Reference: See
CLAUDE.md
for detailed tool documentationStage Library: Built-in documentation for 25+ StreamSets stages
Configuration:
custom.yaml
for MCP server registrySwagger Specs: API specifications in
/swagger/
directory
๐งช Development
Project Structure
Adding New Tools
Define tool function with
@mcp.tool()
decoratorAdd comprehensive error handling and logging
Update
custom.yaml
with tool metadataDocument in
CLAUDE.md
Testing
๐ค Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature
)Commit your changes (
git commit -m 'Add amazing feature'
)Push to the branch (
git push origin feature/amazing-feature
)Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
StreamSets for the comprehensive Control Hub APIs
Anthropic for the Model Context Protocol framework
FastMCP for the Python MCP server implementation
๐ง Support
For issues and questions:
Create an issue on GitHub
Check the documentation in
CLAUDE.md
Review the API specifications in
/swagger/
Transform your data pipeline workflows with conversational AI! ๐
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables complete StreamSets Control Hub integration through conversational AI, allowing users to manage data pipelines, monitor jobs, and interactively build new pipelines with 44 tools across 9 StreamSets services. Features persistent pipeline builder sessions that let users create complete ETL workflows through natural language conversations.