Which integrations are available for this server?

Enables use of Hetzner VPS instances as remote training machines with SSH access, automatic environment setup, dataset syncing, and cost tracking for ML model training. Supports Hostinger VPS instances as remote training infrastructure with SSH access, environment configuration, and persistent training sessions for ML workloads. Integrates with Hugging Face Hub for model storage and retrieval as part of the ML training and fine-tuning workflow. Provides serverless GPU function execution through Modal for ML model training and fine-tuning tasks. Enables deployment of fine-tuned GGUF models to Ollama, pulling models from the registry, running inference, and managing local models (list, delete, copy). Supports fine-tuning of GPT models (GPT-4o, GPT-3.5) through OpenAI's fine-tuning API with cost estimation and training status monitoring. Allows registration and use of OVH VPS instances as SSH-accessible training machines with automatic setup and dataset synchronization for ML workloads. Integrates with Replicate model hub for model storage and deployment as part of the ML training pipeline. Uses SQLite for experiment tracking storage, maintaining version control, metrics history, and enabling experiment comparison and forking. Provides optional VPN requirement for secure VPS connections, with status monitoring and connection management for remote training infrastructure. Runs training sessions in tmux on remote VPS instances to maintain persistent training processes across SSH disconnections.

How do I use ML Lab MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@ML Lab MCP train a Mistral 7B model on my customer support dataset" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

ML Lab MCP

by PushPullCommitPush

Overview Schema Related Servers Score Discussions

Python

Hybrid

ML Lab MCP

A comprehensive MCP (Model Context Protocol) server for ML model training, fine-tuning, and experimentation. Transform your AI assistant into a full ML engineering environment.

Features

Unified Credential Management

Encrypted vault for API keys (Lambda Labs, RunPod, Mistral, OpenAI, Together AI, etc.)
PBKDF2 key derivation with AES encryption
Never stores credentials in plaintext

Dataset Management

Register datasets from local files OR client-provided content (JSONL, CSV, Parquet)
Upload datasets directly without server filesystem access
Automatic schema inference and statistics
Train/val/test splitting
Template-based transformations

Experiment Tracking

SQLite-backed experiment storage
Version control and comparison
Fork experiments with config modifications
Full metrics history

Multi-Backend Training

Local: transformers + peft + trl for local GPU training
Mistral API: Native fine-tuning for Mistral models
Together AI: Hosted fine-tuning service
OpenAI: GPT model fine-tuning

Cloud GPU Provisioning

Lambda Labs: H100, A100 instances
RunPod: Spot and on-demand GPUs
Automatic price comparison across providers
Smart routing based on cost and availability

Remote VPS Support

Use any SSH-accessible machine (Hetzner, Hostinger, OVH, home server, university cluster)
Automatic environment setup
Dataset sync via rsync
Training runs in tmux (persistent across disconnects)
Amortized hourly cost calculation from monthly fees

Cost Estimation

Pre-training cost estimates across all providers
Real-time pricing queries
Time estimates based on model and dataset size

Ollama Integration

Deploy fine-tuned GGUF models to Ollama
Pull models from Ollama registry
Chat/inference testing directly from MCP
Model management (list, delete, copy)

Open WebUI Integration

Create model presets with system prompts
Knowledge base management (RAG)
Chat through Open WebUI (applies configs + knowledge)
Seamless Ollama ↔ Open WebUI workflow

Installation

pip install ml-lab-mcp

# With training dependencies
pip install ml-lab-mcp[training]

# With cloud provider support
pip install ml-lab-mcp[cloud]

# Everything
pip install ml-lab-mcp[training,cloud,dev]

Quick Start

1. Initialize and Create Vault

ml-lab init
ml-lab vault create

2. Add Provider Credentials

ml-lab vault unlock
ml-lab vault add --provider lambda_labs --api-key YOUR_KEY
ml-lab vault add --provider mistral --api-key YOUR_KEY

3. Configure with Claude Code / Claude Desktop

Add to your MCP configuration:

{
  "mcpServers": {
    "ml-lab": {
      "command": "ml-lab",
      "args": ["serve"]
    }
  }
}

MCP Tools

Credentials

Tool	Description
`creds_create_vault`	Create encrypted credential vault
`creds_unlock`	Unlock vault with password
`creds_add`	Add provider credentials
`creds_list`	List configured providers
`creds_test`	Verify credentials work (Lambda Labs, GCP, OpenAI supported)

Datasets

Tool	Description
`dataset_register`	Register a dataset from a local file
`dataset_register_content`	Register a dataset from client-provided content (CSV, JSON, JSONL, Parquet)
`dataset_list`	List all datasets
`dataset_inspect`	View schema and statistics
`dataset_preview`	Preview samples
`dataset_split`	Create train/val/test splits
`dataset_transform`	Apply template transformations

Experiments

Tool	Description
`experiment_create`	Create new experiment
`experiment_list`	List experiments
`experiment_get`	Get experiment details
`experiment_compare`	Compare multiple experiments
`experiment_fork`	Fork with modifications

Training

Tool	Description
`train_estimate`	Estimate cost/time across providers
`train_launch`	Start training run
`train_status`	Check run status
`train_stop`	Stop training

Infrastructure

Tool	Description
`infra_list_gpus`	List available GPUs with pricing
`infra_provision`	Provision cloud instance
`infra_terminate`	Terminate instance

Remote VPS

Tool	Description
`vps_register`	Register a VPS (host, user, key, GPU info, monthly cost)
`vps_list`	List all registered VPS machines
`vps_status`	Check VPS status (online, GPU, running jobs)
`vps_unregister`	Remove a VPS from registry
`vps_setup`	Install training dependencies on VPS
`vps_sync`	Sync dataset to VPS
`vps_run`	Run command on VPS
`vps_logs`	Get training logs from VPS

Ollama

Tool	Description
`ollama_status`	Check Ollama status (running, version, GPU)
`ollama_list`	List models in Ollama
`ollama_pull`	Pull model from registry
`ollama_deploy`	Deploy GGUF to Ollama
`ollama_chat`	Chat with a model
`ollama_delete`	Delete a model

Open WebUI

Tool	Description
`owui_status`	Check Open WebUI connection
`owui_list_models`	List model configurations
`owui_create_model`	Create model preset (system prompt, params)
`owui_delete_model`	Delete model configuration
`owui_list_knowledge`	List knowledge bases
`owui_create_knowledge`	Create knowledge base
`owui_add_knowledge_file`	Add file to knowledge base
`owui_chat`	Chat through Open WebUI

Security

Tool	Description
`security_audit_log`	View recent audit log entries
`security_audit_summary`	Get audit activity summary
`security_tailscale_status`	Check Tailscale VPN connection
`security_ssh_key_rotate`	Rotate SSH key for a VPS
`creds_expiry_check`	Check credential expiry status
`creds_rotate`	Rotate credentials for a provider

Codex Integration (Executor LLM)

Tool	Description
`codex_status`	Check if Codex CLI is available
`codex_analyze_error`	Have Codex analyze errors and suggest fixes
`codex_generate_training_script`	Generate training script from experiment config
`codex_fix_code`	Have Codex fix issues in training code
`codex_optimize_config`	Optimize training config for quality/speed/memory
`codex_debug_training`	Debug training issues from logs
`codex_run`	Run arbitrary task with Codex

Architecture: Separation of intelligence vs execution:

Planner LLM (Claude): Reasoning, architecture, tradeoffs
Executor LLM (Codex): Precise code edits + CLI work
World tools (ML Lab): Data, training, infra, deployment

Deep Thinking Analysis (Ollama Reasoning Models)

Tool	Description
`thinking_analyze`	Run deep analysis with reasoning models (DeepSeek R1, QwQ)
`thinking_schedule`	Schedule automated analysis (after training, daily, weekly)
`thinking_reports`	List and retrieve analysis reports

Analysis Types:

training - Analyze loss curves, convergence, hyperparameters
experiment - Compare experiments, suggest improvements
activity - Review audit logs for patterns/anomalies
cost - Cost efficiency analysis and recommendations
dataset - Data quality and training suitability

Schedule Frequencies:

after_training - Auto-analyze when training completes
hourly / daily / weekly - Periodic reviews

Reports stored in ~/.cache/ml-lab/reports/ for historical reference.

Example Workflow

User: Fine-tune Llama 3.1 8B on my customer support data

Claude: I'll help you fine-tune Llama 3.1 8B. Let me first check your data.

[Uses dataset_register to load ./support_data.jsonl]

Found 15,000 samples. Let me estimate the costs:

[Uses train_estimate]

| Provider      | Time   | Cost   | Availability |
|---------------|--------|--------|--------------|
| vps:hetzner-01| ~4h    | $0.28* | Idle         |
| Local (4090)  | ~12h   | $0     | Ready        |
| Lambda H100   | ~2h    | $5.50  | 3 available  |
| Together API  | ~4h    | $12    | Queued       |

*amortized from $200/mo

Which would you prefer?

User: Use Lambda Labs

Claude: [Uses experiment_create, then train_launch with backend=lambda]

Training started! Run ID: abc123
I'll monitor progress and let you know when it completes.

Architecture

src/ml_lab/
├── server.py           # MCP server entry point (61 tools)
├── credentials.py      # Encrypted credential vault
├── cli.py              # Command-line interface
├── backends/
│   ├── base.py         # Training backend interface
│   ├── local.py        # Local GPU training
│   ├── mistral_api.py  # Mistral fine-tuning API
│   ├── together_api.py # Together AI API
│   ├── openai_api.py   # OpenAI fine-tuning API
│   └── vertex_api.py   # Google Vertex AI (Gemini)
├── cloud/
│   ├── base.py         # Cloud provider interface
│   ├── lambda_labs.py  # Lambda Labs integration
│   ├── runpod.py       # RunPod integration
│   ├── modal_provider.py # Modal integration
│   └── remote_vps.py   # Generic SSH VPS support (+ Tailscale)
├── storage/
│   ├── datasets.py     # Dataset management
│   └── experiments.py  # Experiment tracking
├── inference/
│   ├── ollama.py       # Ollama integration
│   ├── openwebui.py    # Open WebUI integration
│   └── thinking.py     # Deep thinking analysis (DeepSeek R1, QwQ)
├── integrations/
│   └── codex.py        # Codex CLI integration (executor LLM)
├── security/
│   └── audit.py        # Audit logging
└── evals/
    └── benchmarks.py   # Evaluation suite

Security

Credentials encrypted with Fernet (AES-128-CBC)
PBKDF2-SHA256 key derivation (480,000 iterations)
Vault file permissions set to 600 (owner read/write only)
API keys never logged or transmitted unencrypted
Audit logging: All sensitive operations logged to ~/.cache/ml-lab/audit.log
Credential expiry: Automatic tracking with rotation reminders
Tailscale support: Optional VPN requirement for VPS connections
SSH key rotation: Automated rotation with rollback on failure

Supported Providers

Compute Providers

Lambda Labs (H100, A100, A10)
RunPod (H100, A100, RTX 4090)
Modal (serverless GPU functions)

Fine-Tuning APIs

Mistral AI (Mistral, Mixtral, Codestral)
Together AI (Llama, Mistral, Qwen)
OpenAI (GPT-4o, GPT-3.5)
Google Vertex AI (Gemini 1.5 Pro, Gemini 1.5 Flash)

Model Hubs

Hugging Face Hub
Replicate
Ollama (local GGUF models)

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

License

PolyForm Noncommercial 1.0.0 - free for personal use, contact for commercial licensing.

See LICENSE for details.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PushPullCommitPush/ml-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server