Slurm Cheatsheet
A compact reference for Slurm commands and useful options, with examples.
Job submission
salloc - Obtain a job allocation for interactive use (docs)
sbatch - Submit a batch script for later execution (docs)
srun - Obtain a job allocation and run an application (docs)
Option | Description |
---|---|
-A, --account=<account> | Account to be charged for resources used |
-a, --array=<index> | Job array specification (sbatch only) |
-b, --begin=<time> | Initiate job after specified time |
-C, --constraint=<features> | Required node features |
--cpu-bind=<type> | Bind tasks to specific CPUs (srun only) |
-c, --cpus-per-task=<count> | Number of CPUs required per task |
-d, --dependency=<state:jobid> | Defer job until specified jobs reach specified state |
-m, --distribution=<method[:method]> | Specify distribution methods for remote processes |
-e, --error=<filename> | File in which to store job error messages (sbatch and srun only) |
-x, --exclude=<name> | Specify host names to exclude from job allocation |
--exclusive | Reserve all CPUs and GPUs on allocated nodes |
--export=<name=value> | Export specified environment variables (e.g., all, none) |
--gres=<list> | Generic resources required per node (e.g., GPUs) |
-J, --job-name=<name> | Job name |
-l, --label | Prepend task ID to output (srun only) |
--mail-type=<type> | E-mail notification type (e.g., begin, end, fail, requeue, all) |
--mail-user=<address> | E-mail address |
--mem=<size>[units] | Memory required per allocated node (e.g., 16GB) |
--mem-per-cpu=<size>[units] | Memory required per allocated CPU (e.g., 2GB) |
-w, --nodelist=<hostnames> | Specify host names to include in job allocation |
-N, --nodes=<count> | Number of nodes required for the job |
-n, --ntasks=<count> | Number of tasks to be launched |
--ntasks-per-node=<count> | Number of tasks to be launched per node |
-o, --output=<filename> | File in which to store job output (sbatch and srun only) |
-p, --partition=<names> | Partition in which to run the job |
--signal=[B:]<num>[@time] | Signal job when approaching time limit |
-t, --time=<time> | Limit for job run time |
Examples:
# Request interactive job on debug node with 4 CPUs
salloc -p debug -c 4
# Request interactive job with V100 GPU
salloc -p gpu --gres=gpu:v100:1
# Submit batch job
sbatch batch.job
Job management
squeue - View information about jobs in scheduling queue (docs)
Option | Description |
---|---|
-A, --account=<account_list> | Filter by accounts (comma-separated list) |
-o, --format=<options> | Output format to display |
-j, --jobs=<job_id_list> | Filter by job IDs (comma-separated list) |
-l, --long | Show more available information |
--me | Filter by your own jobs |
-n, --name=<job_name_list> | Filter by job names (comma-separated list) |
-p, --partition=<partition_list> | Filter by partitions (comma-separated list) |
-P, --priority | Sort jobs by priority |
--start | Show the expected start time and resources to be allocated for pending jobs |
-t, --states=<state_list> | Filter by states (comma-separated list) |
-u, --user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View your own job queue with estimated start times
squeue --me
# View own job queue with estimated start times for pending jobs
squeue --me --start
# View job queue on specified partition in long format
squeue -lp epyc-64
scancel - Signal or cancel jobs, job arrays, or job steps (docs)
Option | Description |
---|---|
-A, --account=<account> | Restrict to the specified account |
-n, --name=<job_name> | Restrict to jobs with specified name |
-w, --nodelist=<hostnames> | Restrict to jobs using the specified host names (comma-separated list) |
-p, --partition=<partition> | Restrict to the specified partition |
-u, --user=<username> | Restrict to the specified user |
Examples:
# Cancel specific job
scancel 111111
# Cancel all your own jobs
scancel -u $USER
# Cancel your own jobs on specified partition
scancel -u $USER -p oneweek
# Cancel your own jobs in specified state
scancel -u $USER -t pending
sprio - View job scheduling priorities (docs)
Option | Description |
---|---|
-o, --format=<options> | Output format to display |
-j, --jobs=<job_id_list> | Filter by job IDs (comma-separated list) |
-l, --long | Show more available information |
-n, --norm | Show the normalized priority factors |
-p, --partition=<partition_list> | Filter by partitions (comma-separated list) |
-u, --user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View normalized job priorities for your own jobs
sprio -nu $USER
# View normalized job priorities for specified partition
sprio -nlp gpu
Job accounting
sacct - View job accounting data (docs)
Option | Description |
---|---|
-A, --account=<account_list> | Filter by accounts (comma-separated list) |
-X, --allocations | Show job allocations, but not job steps |
-a, --allusers | Show jobs for all users |
-E, --endtime=<time> | End of reporting period |
-o, --format=<options> | Output format to display |
-j, --jobs=<job_id_list> | Filter by job IDs (comma-separated list) |
--name=<job_name_list> | Filter by job names (comma-separated list) |
-N, --nodelist=<hostnames> | Filter by host names (comma-separated list) |
-r, --partition=<partition_list> | Filter by partitions (comma-separated list) |
-S, --starttime=<time> | Start of reporting period |
-s, --state=<state_list> | Filter by states (comma-separated list) |
-u, --user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View accounting data for specific job with custom format
sacct -j 111111 --format=jobid,jobname,submit,exitcode,elapsed,reqnodes,reqcpus,reqmem
# View compact accounting data for your own jobs for specified time range
sacct -X -S 2022-07-01 -E 2022-07-14
sacctmgr - View or modify account information (docs)
sacctmgr show associations
sacctmgr show user <username>
Option | Description |
---|---|
cluster=<clusters> | Filter by clusters (e.g., condo, discovery) |
format=<options> | Output format to display |
user=<user_list> | Filter by users (comma-separated list) |
Examples:
# View your own associations with custom format
sacctmgr show associations user=$USER format=cluster,account,user,qos
sreport - Generate reports from accounting data (docs)
sreport cluster accountutilizationbyuser
sreport cluster userutilizationbyaccount
sreport job sizesbyaccount
sreport user topusage
Option | Description |
---|---|
-T, --tres=<resource_list> | Resources to report (e.g., cpu, gpu, mem, billing, all) |
clusters=<clusters> | Filter by clusters (e.g., condo, discovery) |
end=<time> | End of reporting period |
format=<options> | Output format to display |
start=<time> | Start of reporting period |
accounts=<account_list> | Filter by accounts (comma-separated list) |
users=<user_list> | Filter by users (comma-separated list) |
nodes=<hostnames> | Filter by host names (comma-separated list) (job reports only) |
partitions=<partition_list> | Filter by partitions (comma-separated list) (job reports only) |
printjobcount | Print number of jobs ran instead of time used (job reports only) |
Examples:
# Report account utilization for specified user and time range
sreport cluster accountutilizationbyuser start=2022-07-01 end=2022-07-14 users=$USER
# Report account utilization by user for specified account and time range
sreport cluster userutilizationbyaccount start=2022-07-01 end=2022-07-14 accounts=ttrojan_123
# Report job sizes for specified partition
sreport job sizesbyaccount partitions=epyc-64 printjobcount
# Report top users for specified account and time range
sreport user topusage start=2022-07-01 end=2022-07-14 accounts=ttrojan_123
Partition and node information
sinfo - View information about nodes and partitions (docs)
Option | Description |
---|---|
-o, --format=<options> | Output format to display |
-l, --long | Show more available information |
-N, --Node | Show information in a node-oriented format |
-n, --nodes=<hostnames> | Filter by host names (comma-separated list) |
-p, --partition=<partition_list> | Filter by partitions (comma-separated list) |
-t, --states=<state_list> | Filter by node states (comma-separated list) |
-s, --summarize | Show summary information |
Examples:
# View all partitions and nodes by state
sinfo
# Summarize node states by partition
sinfo -s
# View nodes in idle state
sinfo --states=idle
# View nodes for specified partition in long, node-oriented format
sinfo -lNp epyc-64
scontrol - View or modify configuration and state (docs)
scontrol show partition <partition>
scontrol show node <hostname>
scontrol show job <job_id>
Option | Description |
---|---|
-d, --details | Show more details |
-o, --oneliner | Show information on one line |
scontrol hold <job_list>
scontrol release <job_list>
scontrol show hostnames
Examples:
# View information for specified partition
scontrol show partition epyc-64
# View information for specified node
scontrol show node b22-01
# View detailed information for running job
scontrol show job 111111 -d
# View hostnames for job (one name per line)
scontrol show hostnames
Output environment variables
Variable | Description |
---|---|
SLURM_ARRAY_TASK_COUNT | Number of tasks in job array |
SLURM_ARRAY_TASK_ID | Job array task ID |
SLURM_CPUS_PER_TASK | Number of CPUs requested per task |
SLURM_JOB_ACCOUNT | Account used for job |
SLURM_JOB_ID | Job ID |
SLURM_JOB_NAME | Job Name |
SLURM_JOB_NODELIST | List of nodes allocated to job |
SLURM_JOB_NUM_NODES | Number of nodes allocated to job |
SLURM_JOB_PARTITION | Partition used for job |
SLURM_NTASKS | Number of job tasks |
SLURM_PROCID | MPI rank of current process |
SLURM_SUBMIT_DIR | Directory from which job was submitted |
SLURM_TASKS_PER_NODE | Number of job tasks per node |
Examples:
# Specify OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Specify MPI tasks
srun -n $SLURM_NTASKS ./mpi_program
Custom CARC Slurm commands
myaccount - View own account information
nodeinfo - View partition and node states
gpuinfo - View GPU states
cqueue - View jobs in scheduling queue
myqueue - View own jobs in scheduling queue
jobhist - View compact history of own jobs
jobinfo - View detailed information about jobs