| get_configA | Get the current srunx configuration including resource defaults and environment settings. |
| list_ssh_profilesA | List all configured SSH connection profiles for remote SLURM clusters. Shows profile names, hostnames, and configured mount points. |
| submit_jobA | Submit a SLURM job. Args:
command: Shell command to execute (e.g. "python train.py --epochs 100")
name: Job name for identification in SLURM queue
nodes: Number of compute nodes to allocate
gpus_per_node: Number of GPUs per node (0 for CPU-only)
ntasks_per_node: Number of tasks per node
cpus_per_task: Number of CPUs per task
memory_per_node: Memory per node (e.g. "32GB", "64G")
time_limit: Wall time limit (e.g. "4:00:00", "1-00:00:00")
partition: SLURM partition name (e.g. "gpu", "cpu")
nodelist: Specific nodes to use (e.g. "node001,node002")
conda: Conda environment name to activate before running
venv: Path to Python virtual environment to activate
env_vars: Additional environment variables as key-value pairs
log_dir: Directory for stdout/stderr log files
work_dir: Working directory for the job (defaults to cwd)
use_ssh: If true, submit via SSH to remote SLURM cluster
|
| list_jobsB | List current user's SLURM jobs in the queue. Args:
use_ssh: If true, query jobs via SSH on remote cluster
|
| get_job_statusA | Get the status of a specific SLURM job. Args:
job_id: SLURM job ID to check
use_ssh: If true, query via SSH on remote cluster
|
| cancel_jobB | Cancel a running or pending SLURM job. Args:
job_id: SLURM job ID to cancel
use_ssh: If true, cancel via SSH on remote cluster
|
| get_job_logsB | Get stdout/stderr logs for a SLURM job. Args:
job_id: SLURM job ID
job_name: Optional job name to help locate log files
use_ssh: If true, fetch logs via SSH from remote cluster
|
| get_resourcesA | Get current GPU and node resource availability on the SLURM cluster. Args:
partition: Specific partition to check (None for all partitions)
use_ssh: If true, query resources via SSH on remote cluster
|
| sync_filesA | Sync files between local machine and remote SLURM cluster using rsync. Can sync using a configured mount point (profile_name + mount_name),
or using explicit paths (local_path + remote_path).
Args:
profile_name: SSH profile name (uses current profile if not specified)
mount_name: Mount point name from the SSH profile to sync
local_path: Local directory path (alternative to mount_name)
remote_path: Remote directory path (alternative to mount_name)
dry_run: If true, show what would be transferred without actually syncing
|
| create_workflowA | Create a SLURM workflow YAML file. Generates a YAML workflow definition that can be executed with run_workflow.
Each job in the workflow can depend on other jobs, forming a DAG.
Args:
name: Workflow name for identification
jobs: List of job definitions. Each job dict should contain:
- name (required): Job identifier
- command (required for regular jobs): Command as string or list of strings
- script_path (required for shell jobs): Path to shell script
- depends_on: List of job names this job depends on (e.g. ["preprocess"])
Supports dependency types: "afterok:job_a", "after:job_a", "afterany:job_a"
- retry: Number of retry attempts on failure (default 0)
- retry_delay: Seconds between retries (default 60)
- resources: Dict with nodes, gpus_per_node, ntasks_per_node,
cpus_per_task, memory_per_node, time_limit, partition, nodelist
- environment: Dict with conda, venv, env_vars, container
- log_dir: Log directory path
- work_dir: Working directory path
output_path: File path to write the YAML workflow (e.g. "workflow.yaml")
args: Optional template variables for Jinja2 templating in job definitions
default_project: Default SSH project/mount name for file syncing
|
| validate_workflowA | Validate a workflow YAML file for correctness. Checks for valid YAML syntax, correct job structure, dependency resolution,
and circular dependency detection.
Args:
yaml_path: Path to the YAML workflow file to validate
|
| run_workflowA | Execute a SLURM workflow from a YAML file. Jobs are executed in dependency order - independent jobs run in parallel,
dependent jobs wait for their prerequisites to complete.
Args:
yaml_path: Path to the YAML workflow file
from_job: Start execution from this job (skip earlier jobs)
to_job: Stop execution at this job (skip later jobs)
single_job: Execute only this specific job, ignoring dependencies
dry_run: If true, show what would be executed without actually running
args: Optional mapping merged over the YAML ``args`` section before
Jinja rendering. ``python:`` prefix values are rejected.
sweep: Optional sweep spec: ``{"matrix": {...}, "fail_fast": bool,
"max_parallel": int}``. When present, the request goes through
:class:`SweepOrchestrator` and the response contains
``sweep_run_id``.
mount: Optional mount name from the active SSH profile. When
provided, the run is routed through the configured cluster
adapter with mount-aware path translation for ``work_dir`` /
``log_dir``. When omitted (default), the run stays on the
local SLURM client — same behaviour as pre-5a.
|
| list_workflowsA | List workflow YAML files in a directory. Scans the directory for YAML files that contain a valid srunx workflow
structure (must have 'name' and 'jobs' keys).
Args:
directory: Directory to search for workflow files (default: current directory)
|
| get_workflowB | Read and parse a workflow YAML file, returning its full structure. Args:
yaml_path: Path to the YAML workflow file
|