Skip to main content
Glama
README.md24.3 kB
# Installation ## macOS Installation Guide ### Prerequisites 1. **Install uv** (Python package manager): ```bash brew install uv ``` ### Install the Vast.ai MCP Server 1. **Clone the repository**: ```bash git clone https://github.com/your-repo/vastai-mcp.git cd vastai-mcp ``` 2. **Install dependencies using uv**: ```bash uv sync ``` 3. **Install the server as a tool**: ```bash uv tool install -e . # Install from current directory ``` ### Configuration 1. **Get your Vast.ai API key**: - Log in to [console.vast.ai](https://console.vast.ai) - Go to Account > API Keys - Create or copy your API key 3. **Configure MCP client** (for Claude Desktop or other MCP clients): Update your MCP configuration file (`~/.cursor/mcp.json` for Cursor): ```json { "mcpServers": { "vast-ai": { "command": "uv", "args": [ "run", "vast-mcp-server" ], "env": { "VAST_API_KEY": "your_vast_api_key_here", "SSH_KEY_FILE": "~/.ssh/id_rsa", "SSH_KEY_PUBLIC_FILE": "~/.ssh/id_rsa.pub" } } } } ``` ### Verify Installation 1. **Test the server directly**: ```bash uv tool run vast-mcp-server --help ``` ### SSH Key Setup (Recommended) For full functionality, ensure you have SSH keys set up: 1. **Generate SSH key pair** (if you don't have one): ```bash ssh-keygen -t rsa -b 4096 -C "your_email@example.com" ``` 2. **Verify key files exist**: ```bash ls -la ~/.ssh/id_rsa* ``` You should see both `id_rsa` (private) and `id_rsa.pub` (public) files. ### Troubleshooting - **Permission denied**: Make sure your SSH key has correct permissions: ```bash chmod 600 ~/.ssh/id_rsa chmod 644 ~/.ssh/id_rsa.pub ``` - **API key issues**: Verify your API key is correct and has proper permissions on Vast.ai - **Network issues**: Ensure you can reach `console.vast.ai` from your network # Vast.ai MCP Server Usage Guide This document describes how to use the Vast.ai MCP (Model Context Protocol) server to interact with the Vast.ai GPU cloud platform. ## Available Tools This server provides **23 tools** for managing Vast.ai GPU instances: ### 1. show_user_info() Show current user information and account balance. **Returns:** - Username, email, account balance, user ID - SSH key information (if available) - Total spent amount ### 2. show_instances(owner: str = "me") Show user's instances (running, stopped, etc.) **Parameters:** - `owner` (optional): Owner of instances to show (default: "me") **Returns:** - List of all instances with their details: - Instance ID and status - Label and machine ID - GPU type and specifications - Hourly cost - Docker image - Public IP address (if available) - Creation date ### 3. search_offers(query: str = "", limit: int = 20, order: str = "score-") Search for available GPU offers/machines to rent. **Parameters:** - `query` (optional): Search query in key=value format (e.g., "gpu_name=RTX_4090 num_gpus=2") - `limit` (optional): Maximum number of results to return (default: 20) - `order` (optional): Sort order, append '-' for descending (default: "score-") **Returns:** - List of available offers with: - Offer ID - GPU specifications (name, count) - CPU and RAM details - Storage space - Hourly cost - Location and reliability score - CUDA version - Internet speeds **Example queries:** - `"gpu_name=RTX_4090"` - Search for RTX 4090 GPUs - `"num_gpus=2 cpu_ram>=32"` - Search for dual GPU setups with 32GB+ RAM ### 4. create_instance(offer_id: int, image: str, disk: float = 10.0, ssh: bool = False, jupyter: bool = False, direct: bool = False, env: str = "", label: str = "", bid_price: float = None) Create a new instance from an offer. **Parameters:** - `offer_id`: ID of the offer to rent (from search_offers) - `image`: Docker image to run (e.g., "pytorch/pytorch:latest") - `disk` (optional): Disk size in GB (default: 10.0) - `ssh` (optional): Enable SSH access (default: False) - `jupyter` (optional): Enable Jupyter notebook (default: False) - `direct` (optional): Use direct connections (default: False) - `env` (optional): Environment variables as dict (default: None) - `label` (optional): Label for the instance - `bid_price` (optional): Bid price for interruptible instances **Returns:** - Success message with instance ID or error details **Example:** ``` create_instance( offer_id=12345, image="pytorch/pytorch:latest", disk=40.0, ssh=True, direct=True, env={"JUPYTER_ENABLE_LAB": "yes"}, label="My PyTorch Training" ) ``` ### 5. destroy_instance(instance_id: int) Destroy an instance, completely removing it from the system. Don't need to stop it first. **Parameters:** - `instance_id`: ID of the instance to destroy **Returns:** - Success/failure message ### 6. start_instance(instance_id: int) Start a stopped instance. **Parameters:** - `instance_id`: ID of the instance to start **Returns:** - Success/failure message ### 7. stop_instance(instance_id: int) Stop a running instance (without destroying it). **Parameters:** - `instance_id`: ID of the instance to stop **Returns:** - Success/failure message ### 8. search_volumes(query: str = "", limit: int = 20) Search for available storage volume offers. **Parameters:** - `query` (optional): Search query in key=value format - `limit` (optional): Maximum number of results to return (default: 20) **Returns:** - List of available volume offers with: - Volume offer ID - Storage capacity - Cost per GB per month - Location and reliability - Disk bandwidth - Internet speeds ### 9. label_instance(instance_id: int, label: str) Set a label on an instance for easier identification. **Parameters:** - `instance_id`: ID of the instance to label - `label`: Label text to set **Returns:** - Success/failure message ### 10. launch_instance_workflow(gpu_name: str, num_gpus: int, image: str, region: str = "", disk: float = 16.0, ssh: bool = True, jupyter: bool = False, direct: bool = True, label: str = "") Launch the top instance from search offers based on given parameters (streamlined alternative to create_instance). **Parameters:** - `gpu_name`: Name of GPU model (e.g., "RTX_4090") - `num_gpus`: Number of GPUs required - `image`: Docker image to run - `region` (optional): Geographical region preference - `disk` (optional): Disk size in GB (default: 16.0) - `ssh` (optional): Enable SSH access (default: True) - `jupyter` (optional): Enable Jupyter notebook (default: False) - `direct` (optional): Use direct connections (default: True) - `label` (optional): Label for the instance **Returns:** - Success message with instance details or error **Example:** ``` launch_instance_workflow( gpu_name="RTX_4090", num_gpus=2, image="pytorch/pytorch:latest", region="North_America", disk=40.0, ssh=True, direct=True, label="My Training Job" ) ``` ### 11. prepay_instance(instance_id: int, amount: float) Deposit credits into a reserved instance for discounted rates. **Parameters:** - `instance_id`: ID of the instance to prepay - `amount`: Amount of credits to deposit **Returns:** - Details about discount rate and coverage period ### 12. reboot_instance(instance_id: int) Reboot an instance (stop/start) without losing GPU priority. **Parameters:** - `instance_id`: ID of the instance to reboot **Returns:** - Success/failure message ### 13. recycle_instance(instance_id: int) Recycle an instance (destroy/create from newly pulled image) without losing GPU priority. **Parameters:** - `instance_id`: ID of the instance to recycle **Returns:** - Success/failure message ### 14. show_instance(instance_id: int) Show detailed information about a specific instance. **Parameters:** - `instance_id`: ID of the instance to show **Returns:** - Detailed instance information including: - Status and specifications - Connection details (IP, SSH, Jupyter) - Cost and runtime information - Configuration details ### 15. logs(instance_id: int, tail: str = "1000", filter_text: str = "", daemon_logs: bool = False) Get logs for an instance. **Parameters:** - `instance_id`: ID of the instance to get logs for - `tail` (optional): Number of lines from end of logs (default: "1000") - `filter_text` (optional): Grep filter for log entries - `daemon_logs` (optional): Get daemon system logs instead of container logs **Returns:** - Instance logs text or status message ### 16. attach_ssh(instance_id: int) Attach an SSH key to an instance for secure access. **Parameters:** - `instance_id`: ID of the instance to attach SSH key to **Returns:** - Success/failure message **Examples:** ```python # Attach SSH key from configured public key file attach_ssh(12345) ``` **Notes:** - Uses the SSH public key file configured in SSH_KEY_PUBLIC_FILE environment variable - Only public SSH keys are accepted (not private keys) - SSH key must start with 'ssh-' prefix (e.g., ssh-rsa, ssh-ed25519) - After attaching, you can SSH to the instance using the corresponding private key ### 17. search_templates() Search for available templates on Vast.ai. **Parameters:** - None **Returns:** - List of available templates with: - Template ID and name - Docker image - Description (if available) - Environment variables - Run type configuration - SSH and Jupyter settings **Example:** ```python # Get all available templates search_templates() ``` **Notes:** - Templates are pre-configured environments that simplify instance creation - Templates may include specific Docker images, environment setups, and startup scripts ### 18. execute_command(instance_id: int, command: str) Execute a (constrained) remote command only available on stopped instances. Use ssh to run commands on running instances. **Parameters:** - `instance_id`: ID of the instance to execute command on - `command`: Command to execute (limited to ls, rm, du) **Returns:** - Command output or status message **Available commands:** - `ls`: List directory contents - `rm`: Remove files or directories - `du`: Summarize device usage for a set of files **Examples:** ```python # List directory contents execute_command(12345, "ls -l -o -r") # Remove files execute_command(12345, "rm -r home/delete_this.txt") # Check disk usage execute_command(12345, "du -d2 -h") ``` **Notes:** - Only works on stopped instances - For running instances, use ssh_execute_command instead - Limited to specific safe commands for security ### 19. ssh_execute_command(remote_host: str, remote_user: str, remote_port: int, command: str) Execute a command on a remote host via SSH. **Parameters:** - `remote_host`: The hostname or IP address of the remote server - `remote_user`: The username to connect as (e.g., 'root', 'ubuntu', 'ec2-user') - `remote_port`: The SSH port number (typically 22 or custom port like 34608) - `command`: The command to execute on the remote host **Returns:** - Command output with exit status, stdout, and stderr **Example:** ```python # Execute a command on a running instance ssh_execute_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="nvidia-smi" ) ``` **Notes:** - Works with any SSH-accessible server, not just Vast.ai instances - Uses the SSH private key file configured in SSH_KEY_FILE environment variable - Automatically handles different SSH key types (RSA, Ed25519, ECDSA, DSS) - Returns detailed output including exit status and both stdout/stderr ### 20. ssh_execute_background_command(remote_host: str, remote_user: str, remote_port: int, command: str, task_name: str = None) Execute a long-running command in the background on a remote host via SSH using nohup. **Parameters:** - `remote_host`: The hostname or IP address of the remote server - `remote_user`: The username to connect as (e.g., 'root', 'ubuntu', 'ec2-user') - `remote_port`: The SSH port number (typically 22 or custom port like 34608) - `command`: The command to execute in the background - `task_name` (optional): Optional name for the task (for easier identification) **Returns:** - Task information including task ID, process ID, and log file path **Example:** ```python # Start a long-running training job ssh_execute_background_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="python train.py --epochs 100", task_name="training_job" ) ``` **Notes:** - Returns task_id and process_id for monitoring - Creates log files on the remote server for output capture - Use ssh_check_background_task to monitor progress - Use ssh_kill_background_task to stop if needed ### 21. ssh_check_background_task(remote_host: str, remote_user: str, remote_port: int, task_id: str, process_id: int, tail_lines: int = 50) Check the status of a background SSH task and get its output. **Parameters:** - `remote_host`: The hostname or IP address of the remote server - `remote_user`: The username to connect as - `remote_port`: The SSH port number - `task_id`: The task ID returned by ssh_execute_background_command - `process_id`: The process ID returned by ssh_execute_background_command - `tail_lines` (optional): Number of recent log lines to show (default: 50) **Returns:** - Status report with process status, log output, and progress information **Example:** ```python # Check on a background task ssh_check_background_task( remote_host="116.43.148.85", remote_user="root", remote_port=26378, task_id="training_job_a1b2c3d4", process_id=12345, tail_lines=100 ) ``` **Notes:** - Shows whether the task is still running or completed - Displays recent log output from the task - Provides total log line count for progress indication ### 22. ssh_kill_background_task(remote_host: str, remote_user: str, remote_port: int, task_id: str, process_id: int) Kill a running background SSH task. **Parameters:** - `remote_host`: The hostname or IP address of the remote server - `remote_user`: The username to connect as - `remote_port`: The SSH port number - `task_id`: The task ID returned by ssh_execute_background_command - `process_id`: The process ID returned by ssh_execute_background_command **Returns:** - Status of the kill operation and cleanup results **Example:** ```python # Stop a background task ssh_kill_background_task( remote_host="116.43.148.85", remote_user="root", remote_port=26378, task_id="training_job_a1b2c3d4", process_id=12345 ) ``` **Notes:** - Attempts graceful termination first, then force kill if necessary - Automatically cleans up temporary log and PID files - Safe to call even if the process has already completed ### 23. disable_sudo_password(remote_host: str, remote_user: str, remote_port: int) Disable sudo password requirement for the sudo group on a remote host via SSH. This function safely modifies the sudoers file to allow passwordless sudo access for users in the sudo group. **Parameters:** - `remote_host`: The hostname or IP address of the remote server - `remote_user`: The username to connect as (e.g., 'root', 'ubuntu', 'ec2-user') - `remote_port`: The SSH port number (typically 22 or custom port like 34608) **Returns:** - Detailed status of the sudoers modification including: - Previous and new sudo configuration - Validation results - Backup file location **Example:** ```python # Disable sudo password on a running instance disable_sudo_password( remote_host="ssh1.vast.ai", remote_user="root", remote_port=26378 ) ``` **Safety Features:** - Creates automatic backup of sudoers file before modification - Validates sudoers syntax with `visudo -c` after changes - Automatically restores backup if validation fails - Shows before/after configuration for verification **Notes:** - Modifies sudoers to: `%sudo ALL=(ALL) NOPASSWD: ALL` - Requires the connecting user to have sudo privileges - Backup files are timestamped for safety - Works with any SSH-accessible Linux system - Test the change with `sudo -l` after execution ### 24. configure_mcp_rules(auto_attach_ssh: bool = None, auto_label: bool = None, wait_for_ready: bool = None, label_prefix: str = None) Configure MCP automation rules that control automatic behaviors during instance creation. **Parameters:** - `auto_attach_ssh` (optional): Enable/disable automatic SSH key attachment for SSH/Jupyter instances - `auto_label` (optional): Enable/disable automatic instance labeling - `wait_for_ready` (optional): Enable/disable waiting for instance readiness after creation - `label_prefix` (optional): Set the prefix for automatic instance labels **Returns:** - Current configuration status and any changes made **Example:** ```python # Configure MCP rules configure_mcp_rules( auto_attach_ssh=True, auto_label=True, label_prefix="my-project", wait_for_ready=True ) # View current configuration configure_mcp_rules() ``` **Notes:** - These rules affect the behavior of create_instance and launch_instance_workflow - Auto-attach SSH applies only when SSH or Jupyter is enabled - Auto-labeling creates timestamps labels when no label is provided - Wait for ready monitors instance status until it becomes "running" ## Configuration ### API Key Setup The server requires a Vast.ai API key. You can configure it in several ways: 1. **Environment Variable:** ```bash export VAST_API_KEY="your_api_key_here" ``` 2. **API Key File:** Create `~/.vastai_api_key` with your API key 3. **Hardcoded (for development):** The current server has a hardcoded API key for testing purposes ### Running the Server ```bash # Run with default settings (localhost:8000) python vast_mcp_server.py # Run with custom host and port python vast_mcp_server.py --host 0.0.0.0 --port 9000 ``` ## Common Workflows ### 1. Basic Instance Creation Workflow ```python # 1. Check your account show_user_info() # 2. Search for available offers search_offers("gpu_name=RTX_4090", limit=10) # 3. Create instance from an offer create_instance( offer_id=12345, image="pytorch/pytorch:latest", disk=20.0, ssh=True, direct=True ) # 4. Check instance status show_instances() ``` ### 2. Instance Management ```python # View all instances show_instances() # Stop an instance stop_instance(instance_id=67890) # Start it again later start_instance(instance_id=67890) # Permanently destroy when done destroy_instance(instance_id=67890) ``` ### 3. Finding Storage ```python # Search for storage volumes search_volumes("disk_space>=100", limit=5) ``` ### 4. Advanced Instance Management ```python # Launch instance with specific GPU requirements (streamlined approach) launch_instance_workflow( gpu_name="RTX_4090", num_gpus=2, image="pytorch/pytorch:latest", region="North_America", disk=40.0, ssh=True, direct=True, label="Training Job" ) # Get detailed information about an instance show_instance(instance_id=12345) # Set a label for easier identification label_instance(instance_id=12345, label="Production Model") # Get instance logs logs(instance_id=12345, tail="500", filter_text="error") # Reboot instance without losing GPU priority reboot_instance(instance_id=12345) ``` ### 5. Instance Monitoring and Maintenance ```python # Monitor instance logs with filtering logs(instance_id=12345, filter_text="WARNING|ERROR", tail="100") # Check instance details show_instance(instance_id=12345) # Recycle instance to update to latest image recycle_instance(instance_id=12345) # Prepay for discounted rates prepay_instance(instance_id=12345, amount=50.0) ``` ### 6. SSH Access Management ```python # Create instance with SSH enabled create_instance( offer_id=12345, image="ubuntu:22.04", ssh=True, direct=True, label="SSH Server" ) # Attach your SSH key for access attach_ssh(instance_id=67890) # Get instance details including SSH connection info show_instance(instance_id=67890) # Monitor instance through logs logs(instance_id=67890, tail="50") ``` ### 7. Template Browsing ```python # Browse available templates search_templates() ``` ### 8. Instance Command Execution ```python # For stopped instances, use constrained execute_command stop_instance(instance_id=12345) # Execute safe commands on stopped instance execute_command(instance_id=12345, command="ls -la /workspace") execute_command(instance_id=12345, command="du -sh /workspace") execute_command(instance_id=12345, command="rm -rf /tmp/old_files") # For running instances, use SSH commands start_instance(instance_id=12345) # Get instance connection details instance_details = show_instance(instance_id=12345) # Extract SSH host, port from the output # Execute commands via SSH on running instance ssh_execute_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="nvidia-smi" ) # Check system resources ssh_execute_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="df -h && free -h && ps aux" ) ``` ### 9. Background Task Management ```python # Start a long-running training job in background task_info = ssh_execute_background_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="python train.py --epochs 100 --batch-size 32", task_name="pytorch_training" ) # Extract task_id and process_id from task_info output # Format: "Task ID: pytorch_training_a1b2c3d4" and "Process ID: 12345" # Monitor progress periodically ssh_check_background_task( remote_host="116.43.148.85", remote_user="root", remote_port=26378, task_id="pytorch_training_a1b2c3d4", process_id=12345, tail_lines=100 ) # Stop the task if needed ssh_kill_background_task( remote_host="116.43.148.85", remote_user="root", remote_port=26378, task_id="pytorch_training_a1b2c3d4", process_id=12345 ) ``` ### 10. Complete ML Training Workflow ```python # 1. Find and create a GPU instance search_offers("gpu_name=RTX_4090", limit=5) create_instance( offer_id=12345, image="pytorch/pytorch:latest", disk=50.0, ssh=True, direct=True, env={}, label="ML Training" ) # 2. Get connection details instance_details = show_instance(instance_id=67890) # 3. Set up the environment ssh_execute_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="pip install wandb tensorboard" ) # 4. Upload your training code (assume already done) # 5. Start training in background training_task = ssh_execute_background_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="cd /workspace && python train.py --config config.yaml", task_name="main_training" ) # 6. Monitor training progress ssh_check_background_task( remote_host="116.43.148.85", remote_user="root", remote_port=26378, task_id="main_training_a1b2c3d4", process_id=12345, tail_lines=50 ) # 7. Check GPU utilization ssh_execute_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="nvidia-smi" ) # 8. When training is complete, save results ssh_execute_command( remote_host="116.43.148.85", remote_user="root", remote_port=26378, command="tar -czf model_results.tar.gz /workspace/outputs" ) # 9. Clean up destroy_instance(instance_id=67890) ``` ## Query Syntax When searching for offers or volumes, you can use these operators: - `=` or `==` - Equal to - `!=` - Not equal to - `>` - Greater than - `>=` - Greater than or equal to - `<` - Less than - `<=` - Less than or equal to **Example queries:** - `"gpu_name=RTX_4090 num_gpus>=2"` - RTX 4090 with 2 or more GPUs - `"cpu_ram>64 reliability2>=99"` - High RAM and reliability - `"dph_total<=1.0"` - Cost under $1/hour ## Error Handling All methods include error handling and will return descriptive error messages if: - API key is missing or invalid - Network connectivity issues occur - Invalid parameters are provided - Vast.ai API returns errors Check the server logs for detailed error information during development.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CryDevOk/vastai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server