Migrating from Torque to Slurm

How is Slurm different from Torque?

Slurm is different from Torque in a number of ways, including the commands used to submit and monitor jobs, the syntax used to request resources, and the way environment variables behave. Some specific ways in which Slurm is different from Torque include:

Slurm will not allow a job to be submitted whose requested resources exceed the set of resources the job owner has access to--whether or not those resources have been already allocated to other jobs at the moment. Torque will queue the job, but the job would never run.
What Torque calls queues, Slurm calls partitions.
Resources in Slurm are assigned per “task”/process.
In Slurm, environmental variables of the submitting process are passed to the job by default

Submit and Manage Jobs

Info	Torque Command	Slurm Command
Submit a job	`qsub <job script>`	`sbatch <job script>`
Delete a job	`qdel <job ID>`	`scancel <job ID>`
Hold a job	`qhold <job ID>`	`scontrol hold <job ID>`
Release a job	`qrls <job ID>`	`scontrol release <job ID>`
Start an interactive job	`qsub -I <options>`	`salloc <options>` `srun --pty <options>`
Start an interactive job with X forwarding	`qsub -I -X <options>`	`srun --x11 <options>`

needs to be replaced by the name of your job submission script

Job Submission Options

Option	Torque (qsub)	Slurm (sbatch)
Script directive	`#PBS`	`#SBATCH`
Job name	`-N <name>`	`--job-name=<name>` `-J <name>`
Queue	`-q <queue>`	`--partition=<queue>`
Wall time limit	`-l walltime=<hh:mm:ss>`	`--time=<hh:mm:ss>`
Node count	`-l nodes=<count>` `-N <count>`	`--nodes=<count>`
Process count per node	`-l ppn=<count>`	`--ntasks-per-node=<count>`
Core count (per process)		`--cpus-per-task=<cores>`
Memory limit	`-l mem=<limit>`	`--mem=<limit>` (Memory per node in megabytes – MB)
Minimum memory per processor	`-l pmem=<limit>`	`--mem-per-cpu=<memory>`
Request GPUs	`-l gpus=<count>`	`--gres=gpu:<count>`
Request specific nodes	`-l nodes=<node>[,node2[,...]]>`	`-w, --nodelist=<node>[,node2[,...]]` `-F, --nodefile=<node file>`
Request node feature	`-l nodes=<count>:ppn=<count>:<feature>`	`--constraint=<feature>`
Job array	`-t <array indices>` `-a <indexes>`	`--array <indexes>` Where `<indexes>` is replaced by a range (0-15), a list (0, 6, 16-32), or a step function (0-15:4)
Standard output file	`-o <file path>`	`--output=<file path>` (path must exist)
Standard error file	`-e <file path>`	`--error=<file path>` (path must exist)
Combine stdout/stderr to stdout	`-j oe`	`--output=<combined out and err file path>`
Copy environment	`-V`	`--export=ALL` (default) `--export=NONE` to not export environment
Copy environment variable	`-v <variable[=value][,variable2=value2[,...]]>`	`--export=<variable[=value][,variable2=value2[,...]]>`
Job dependency	`-W depend=after:jobID[:jobID...]` `-W depend=afterok:jobID[:jobID...]` `-W depend=afternotok:jobID[:jobID...]` `-W depend=afterany:jobID[:jobID...]`	`--dependency=after:jobID[:jobID...]` `--dependency=afterok:jobID[:jobID...]` `--dependency=afternotok:jobID[:jobID...]` `--dependency=afterany:jobID[:jobID...]`
Request event notification	`-m <events>`	`--mail-type=<events>` Note: multiple mail-type requests may be specified in a comma-separated list: `--mail-type=BEGIN,END,NONE,FAIL,REQUEUE`
Email address	`-M <email address>`	`--mail-user=<email address>`
Defer job until the specified time	`-a <date/time>`	`--begin=<date/time>`
Node exclusive job	`qsub -n`	`--exclusive`

Common Job Commands

Info	Torque Command	Slurm Command
Submit a job	`qsub <job script>`	`sbatch <job script>`
Delete a job	`qdel <job ID>`	`scancel <job ID>`
Hold a job	`qhold <job ID>`	`scontrol hold <job ID>`
Release a job	`qrls <job ID>`	`scontrol release <job ID>`
Start an interactive job	`qsub -I <options>`	`salloc <options>` `srun --pty <options>`
Start an interactive job with X forwarding	`qsub -I -X <options>`	`srun --x11 <options>`

Monitor Jobs

Info	Torque Command	Slurm Command
Job status (all)	`qstat` `showq`	`squeue`
Job status (by job)	`qstat <job ID>`	`squeue -j <job ID>`
Job status (by user)	`qstat -u <user>`	`squeue -u <user>`
Job status (only own jobs)	`qstat_me`	`squeue --me` `squeue --me -l`
Job status (detailed)	`qstat -f <job ID>` `checkjob <job ID>`	`scontrol show job -dd <job ID>`
Show expected start time	`showstart <job ID>`	`squeue -j <job ID> --start`
Monitor or review a job’s resource usage	`qstat -f <job ID>`	`sacct -j <job ID> --format JobID,jobname,NTasks,nodelist,CPUTime,ReqMem,Elapsed`
View job batch script		`scontrol write batch_script <job ID> [filename]`

View Resources on the Cluster

Info	Torque Command	Slurm Command
Queue list / info	`qstat -q [queue]`	`scontrol show partition [queue]`
Node list	`pbsnodes -a` `mdiag -n -v`	`scontrol show nodes`
Node details	`pbsnodes <node>`	`scontrol show node <node>`
Cluster status	`qstat -B`	`sinfo`

Job States

Code	State	Meaning
CA	Canceled	Job was canceled
CD	Completed	Job completed
CF	Configuring	Job resources being configured
CG	Completing	Job is completing
F	Failed	Job terminated with non-zero exit code
NF	Node Fail	Job terminated due to failure of node(s)
PD	Pending	Job is waiting for compute node(s)
R	Running	Job is running on compute node(s)