Migrating from Torque to Slurm

How is Slurm different from Torque?

Slurm is different from Torque in a number of ways, including the commands used to submit and monitor jobs, the syntax used to request resources, and the way environment variables behave. Some specific ways in which Slurm is different from Torque include:

  • Slurm will not allow a job to be submitted whose requested resources exceed the set of resources the job owner has access to--whether or not those resources have been already allocated to other jobs at the moment. Torque will queue the job, but the job would never run.
  • What Torque calls queues, Slurm calls partitions.
  • Resources in Slurm are assigned per “task”/process.
  • In Slurm, environmental variables of the submitting process are passed to the job by default

Submit and Manage Jobs

Info Torque Command Slurm Command
Submit a job qsub <job script> sbatch <job script>
Delete a job qdel <job ID> scancel <job ID>
Hold a job qhold <job ID> scontrol hold <job ID>
Release a job qrls <job ID> scontrol release <job ID>
Start an interactive job qsub -I <options> salloc <options> srun --pty <options>
Start an interactive job with X forwarding qsub -I -X <options> srun --x11 <options>

needs to be replaced by the name of your job submission script

Job Submission Options

Option Torque (qsub) Slurm (sbatch)
Script directive #PBS #SBATCH
Job name -N <name> --job-name=<name> -J <name>
Queue -q <queue> --partition=<queue>
Wall time limit -l walltime=<hh:mm:ss> --time=<hh:mm:ss>
Node count -l nodes=<count> -N <count> --nodes=<count>
Process count per node -l ppn=<count> --ntasks-per-node=<count>
Core count (per process) --cpus-per-task=<cores>
Memory limit -l mem=<limit> --mem=<limit> (Memory per node in megabytes – MB)
Minimum memory per processor -l pmem=<limit> --mem-per-cpu=<memory>
Request GPUs -l gpus=<count> --gres=gpu:<count>
Request specific nodes -l nodes=<node>[,node2[,...]]> -w, --nodelist=<node>[,node2[,...]] -F, --nodefile=<node file>
Request node feature -l nodes=<count>:ppn=<count>:<feature> --constraint=<feature>
Job array -t <array indices> -a <indexes> --array <indexes>
Where <indexes> is replaced by a range (0-15), a list (0, 6, 16-32), or a step function (0-15:4)
Standard output file -o <file path> --output=<file path> (path must exist)
Standard error file -e <file path> --error=<file path> (path must exist)
Combine stdout/stderr to stdout -j oe --output=<combined out and err file path>
Copy environment -V --export=ALL (default) --export=NONE to not export environment
Copy environment variable -v <variable[=value][,variable2=value2[,...]]> --export=<variable[=value][,variable2=value2[,...]]>
Job dependency -W depend=after:jobID[:jobID...]
-W depend=afterok:jobID[:jobID...]
-W depend=afternotok:jobID[:jobID...]
-W depend=afterany:jobID[:jobID...]
--dependency=after:jobID[:jobID...]
--dependency=afterok:jobID[:jobID...]
--dependency=afternotok:jobID[:jobID...]
--dependency=afterany:jobID[:jobID...]
Request event notification -m <events> --mail-type=<events>
Note: multiple mail-type requests may be specified in a comma-separated list:
--mail-type=BEGIN,END,NONE,FAIL,REQUEUE
Email address -M <email address> --mail-user=<email address>
Defer job until the specified time -a <date/time> --begin=<date/time>
Node exclusive job qsub -n --exclusive

Common Job Commands

Info Torque Command Slurm Command
Submit a job qsub <job script> sbatch <job script>
Delete a job qdel <job ID> scancel <job ID>
Hold a job qhold <job ID> scontrol hold <job ID>
Release a job qrls <job ID> scontrol release <job ID>
Start an interactive job qsub -I <options> salloc <options> srun --pty <options>
Start an interactive job with X forwarding qsub -I -X <options> srun --x11 <options>

Monitor Jobs

Info Torque Command Slurm Command
Job status (all) qstat showq squeue
Job status (by job) qstat <job ID> squeue -j <job ID>
Job status (by user) qstat -u <user> squeue -u <user>
Job status (only own jobs) qstat_me squeue --me squeue --me -l
Job status (detailed) qstat -f <job ID> checkjob <job ID> scontrol show job -dd <job ID>
Show expected start time showstart <job ID> squeue -j <job ID> --start
Monitor or review a job’s resource usage qstat -f <job ID> sacct -j <job ID> --format JobID,jobname,NTasks,nodelist,CPUTime,ReqMem,Elapsed
View job batch script scontrol write batch_script <job ID> [filename]

View Resources on the Cluster

Info Torque Command Slurm Command
Queue list / info qstat -q [queue] scontrol show partition [queue]
Node list pbsnodes -a mdiag -n -v scontrol show nodes
Node details pbsnodes <node> scontrol show node <node>
Cluster status qstat -B sinfo

Job States

Code State Meaning
CA Canceled Job was canceled
CD Completed Job completed
CF Configuring Job resources being configured
CG Completing Job is completing
F Failed Job terminated with non-zero exit code
NF Node Fail Job terminated due to failure of node(s)
PD Pending Job is waiting for compute node(s)
R Running Job is running on compute node(s)