User Guides

Slurm Cheatsheet

A compact reference for Slurm commands and useful options, with examples.

Job submission

salloc - Obtain a job allocation for interactive use (docs)
sbatch - Submit a batch script for later execution (docs)
srun - Obtain a job allocation and run an application (docs)

Option	Description
-A, --account=<account>	Account to be charged for resources used
-a, --array=<index>	Job array specification (sbatch only)
-b, --begin=<time>	Initiate job after specified time
-C, --constraint=<features>	Required node features
--cpu-bind=<type>	Bind tasks to specific CPUs (srun only)
-c, --cpus-per-task=<count>	Number of CPUs required per task
-d, --dependency=<state:jobid>	Defer job until specified jobs reach specified state
-m, --distribution=<method[:method]>	Specify distribution methods for remote processes
-e, --error=<filename>	File in which to store job error messages (sbatch and srun only)
-x, --exclude=<name>	Specify host names to exclude from job allocation
--exclusive	Reserve all CPUs and GPUs on allocated nodes
--export=<name=value>	Export specified environment variables (e.g., all, none)
--gpus-per-task=<list>	Number of GPUs required per task
-J, --job-name=<name>	Job name
-l, --label	Prepend task ID to output (srun only)
--mail-type=<type>	E-mail notification type (e.g., begin, end, fail, requeue, all)
--mail-user=<address>	E-mail address
--mem=<size>[units]	Memory required per allocated node (e.g., 16GB)
--mem-per-cpu=<size>[units]	Memory required per allocated CPU (e.g., 2GB)
-w, --nodelist=<hostnames>	Specify host names to include in job allocation
-N, --nodes=<count>	Number of nodes required for the job
-n, --ntasks=<count>	Number of tasks to be launched
--ntasks-per-node=<count>	Number of tasks to be launched per node
-o, --output=<filename>	File in which to store job output (sbatch and srun only)
-p, --partition=<names>	Partition in which to run the job
--signal=[B:]<num>[@time]	Signal job when approaching time limit
-t, --time=<time>	Limit for job run time

Examples:

# Request interactive job on debug node with 4 CPUs
salloc -p debug -c 4

# Request interactive job with V100 GPU
salloc -p gpu --ntasks=1 --gpus-per-task=v100:1

# Submit batch job
sbatch batch.job

Job management

squeue - View information about jobs in scheduling queue (docs)

Option	Description
-A, --account=<account_list>	Filter by accounts (comma-separated list)
-o, --format=<options>	Output format to display
-j, --jobs=<job_id_list>	Filter by job IDs (comma-separated list)
-l, --long	Show more available information
--me	Filter by your own jobs
-n, --name=<job_name_list>	Filter by job names (comma-separated list)
-p, --partition=<partition_list>	Filter by partitions (comma-separated list)
-P, --priority	Sort jobs by priority
--start	Show the expected start time and resources to be allocated for pending jobs
-t, --states=<state_list>	Filter by states (comma-separated list)
-u, --user=<user_list>	Filter by users (comma-separated list)

Examples:

# View your own job queue with estimated start times
squeue --me

# View own job queue with estimated start times for pending jobs
squeue --me --start

# View job queue on specified partition in long format
squeue -lp epyc-64

scancel - Signal or cancel jobs, job arrays, or job steps (docs)

Option	Description
-A, --account=<account>	Restrict to the specified account
-n, --name=<job_name>	Restrict to jobs with specified name
-w, --nodelist=<hostnames>	Restrict to jobs using the specified host names (comma-separated list)
-p, --partition=<partition>	Restrict to the specified partition
-u, --user=<username>	Restrict to the specified user

Examples:

# Cancel specific job
scancel 111111

# Cancel all your own jobs
scancel -u $USER

# Cancel your own jobs on specified partition
scancel -u $USER -p oneweek

# Cancel your own jobs in specified state
scancel -u $USER -t pending

sprio - View job scheduling priorities (docs)

Option	Description
-o, --format=<options>	Output format to display
-j, --jobs=<job_id_list>	Filter by job IDs (comma-separated list)
-l, --long	Show more available information
-n, --norm	Show the normalized priority factors
-p, --partition=<partition_list>	Filter by partitions (comma-separated list)
-u, --user=<user_list>	Filter by users (comma-separated list)

Examples:

# View normalized job priorities for your own jobs
sprio -nu $USER

# View normalized job priorities for specified partition
sprio -nlp gpu

Job accounting

sacct - View job accounting data (docs)

Option	Description
-A, --account=<account_list>	Filter by accounts (comma-separated list)
-X, --allocations	Show job allocations, but not job steps
-a, --allusers	Show jobs for all users
-E, --endtime=<time>	End of reporting period
-o, --format=<options>	Output format to display
-j, --jobs=<job_id_list>	Filter by job IDs (comma-separated list)
--name=<job_name_list>	Filter by job names (comma-separated list)
-N, --nodelist=<hostnames>	Filter by host names (comma-separated list)
-r, --partition=<partition_list>	Filter by partitions (comma-separated list)
-S, --starttime=<time>	Start of reporting period
-s, --state=<state_list>	Filter by states (comma-separated list)
-u, --user=<user_list>	Filter by users (comma-separated list)

Examples:

# View accounting data for specific job with custom format
sacct -j 111111 --format=jobid,jobname,submit,exitcode,elapsed,reqnodes,reqcpus,reqmem

# View compact accounting data for your own jobs for specified time range
sacct -X -S 2022-07-01 -E 2022-07-14

sacctmgr - View or modify account information (docs)

sacctmgr show associations
sacctmgr show user <username>

Option	Description
cluster=<clusters>	Filter by clusters (e.g., condo, discovery)
format=<options>	Output format to display
user=<user_list>	Filter by users (comma-separated list)

Examples:

# View your own associations with custom format
sacctmgr show associations user=$USER format=cluster,account,user,qos

sreport - Generate reports from accounting data (docs)

sreport cluster accountutilizationbyuser
sreport cluster userutilizationbyaccount
sreport job sizesbyaccount
sreport user topusage

Option	Description
-T, --tres=<resource_list>	Resources to report (e.g., cpu, gpu, mem, billing, all)
clusters=<clusters>	Filter by clusters (e.g., condo, discovery)
end=<time>	End of reporting period
format=<options>	Output format to display
start=<time>	Start of reporting period
accounts=<account_list>	Filter by accounts (comma-separated list)
users=<user_list>	Filter by users (comma-separated list)
nodes=<hostnames>	Filter by host names (comma-separated list) (job reports only)
partitions=<partition_list>	Filter by partitions (comma-separated list) (job reports only)
printjobcount	Print number of jobs ran instead of time used (job reports only)

Examples:

# Report account utilization for specified user and time range
sreport cluster accountutilizationbyuser start=2022-07-01 end=2022-07-14 users=$USER

# Report account utilization by user for specified account and time range
sreport cluster userutilizationbyaccount start=2022-07-01 end=2022-07-14 accounts=ttrojan_123

# Report job sizes for specified partition
sreport job sizesbyaccount partitions=epyc-64 printjobcount

# Report top users for specified account and time range
sreport user topusage start=2022-07-01 end=2022-07-14 accounts=ttrojan_123

Partition and node information

sinfo - View information about nodes and partitions (docs)

Option	Description
-o, --format=<options>	Output format to display
-l, --long	Show more available information
-N, --Node	Show information in a node-oriented format
-n, --nodes=<hostnames>	Filter by host names (comma-separated list)
-p, --partition=<partition_list>	Filter by partitions (comma-separated list)
-t, --states=<state_list>	Filter by node states (comma-separated list)
-s, --summarize	Show summary information

Examples:

# View all partitions and nodes by state
sinfo

# Summarize node states by partition
sinfo -s

# View nodes in idle state
sinfo --states=idle

# View nodes for specified partition in long, node-oriented format
sinfo -lNp epyc-64

scontrol - View or modify configuration and state (docs)

scontrol show partition <partition>
scontrol show node <hostname>
scontrol show job <job_id>

Option	Description
-d, --details	Show more details
-o, --oneliner	Show information on one line

scontrol hold <job_list>
scontrol release <job_list>
scontrol show hostnames

Examples:

# View information for specified partition
scontrol show partition epyc-64

# View information for specified node
scontrol show node b22-01

# View detailed information for running job
scontrol show job 111111 -d

# View hostnames for job (one name per line)
scontrol show hostnames

Output environment variables

Variable	Description
SLURM_ARRAY_TASK_COUNT	Number of tasks in job array
SLURM_ARRAY_TASK_ID	Job array task ID
SLURM_CPUS_PER_TASK	Number of CPUs requested per task
SLURM_JOB_ACCOUNT	Account used for job
SLURM_JOB_ID	Job ID
SLURM_JOB_NAME	Job Name
SLURM_JOB_NODELIST	List of nodes allocated to job
SLURM_JOB_NUM_NODES	Number of nodes allocated to job
SLURM_JOB_PARTITION	Partition used for job
SLURM_NTASKS	Number of job tasks
SLURM_PROCID	MPI rank of current process
SLURM_SUBMIT_DIR	Directory from which job was submitted
SLURM_TASKS_PER_NODE	Number of job tasks per node

Examples:

# Specify OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Specify MPI tasks
srun -n $SLURM_NTASKS ./mpi_program

Custom CARC Slurm commands

myaccount - View own account information
acctusage - View account usage information
nodeinfo - View partition and node states
gpuinfo - View GPU states
cqueue - View jobs in scheduling queue
myqueue - View own jobs in scheduling queue
jobhist - View compact history of own jobs
jobinfo - View detailed information about jobs