GPU Programming

Last updated March 06, 2024

Some programs can take advantage of the unique hardware architecture in a graphics processing unit (GPU). GPUs can be used for specialized scientific computing work, including 3D modelling and machine learning. CARC’s Discovery cluster offers a few different models of GPUs for use with your jobs. In addition, Condo Cluster Program users participating in the traditional purchase model have the option to include GPUs in their dedicated resources.

0.0.1 Requesting GPU resources

On Discovery, most GPU nodes are available on the gpu partition. Some GPU nodes are also available on the main and debug partitions. Enter the nodeinfo command for more information.

To request a GPU on the gpu partition, for batch jobs first add the following line to your Slurm job script:

#SBATCH --partition=gpu

Or similarly, use the main or debug partition where other GPUs may be available.

Remember to add one of the following options to your Slurm job script to request the type and number of GPUs you would like to use:

#SBATCH --gpus-per-task=<number>


#SBATCH --gpus-per-task=<gpu_type>:<number>


  • <number> is the number of GPUs per task requested, and
  • <gpu_type> is a GPU model.

Please note that requesting more than 1 GPU does not necessarily mean that your job will use more than 1 GPU. Your program may need to be modified in order to make use of more than 1 GPU.

If using more than 1 GPU and 1 task, you can also use the options --gpus-per-node and --gpus-per-socket if desired. You may also want to use the --gpu-bind option to bind tasks to specific GPUs in order to improve performance; for example, --gpu-bind=single:1 to bind each task to a single, unique GPU.

For Discovery nodes, use the chart below to determine which GPU type to specify:

GPU type GPU model Partitions Max number of GPUs per node
a100 NVIDIA Tesla A100 gpu 2
a40 NVIDIA Tesla A40 gpu 2
v100 NVIDIA Tesla V100 gpu 2
p100 NVIDIA Tesla P100 gpu, debug 2
k40 NVIDIA Tesla K40 main, debug 2

Also note that some A100 GPUs have 40 GB of GPU memory and some have 80 GB of GPU memory. To request a specific A100 model, add one of the following options:

#SBATCH --constraint=a100-40gb


#SBATCH --constraint=a100-80gb

On Endeavour, there may be different GPU types or more than 2 GPUs per node, depending on what the condo group has selected.

For interactive jobs, use similar options with the salloc command:

salloc --partition=gpu --ntasks=1 --gpus-per-task=<gpu_type>:<number>

To see a list of currently available GPUs, enter noderes -f -g.

The maximum number of GPUs that can be used at one time per user, in one job or across multiple jobs, is 36.

There are a few commands you can use for more detailed node and GPU information. For CPUs, the lscpu command will provide information about CPUs. For GPUs, the nvidia-smi command and its various options will provide information about GPUs. Also, after module load nvhpc you can then use the nvaccelinfo command to view information about GPUs. In addition, after module load gcc/11.3.0 hwloc you can then use the lstopo command to view a node’s topology. System Unit (SU) charges

Each job will subtract from your project’s allocated System Units (SUs) depending on the types of resources you request:

Resource reserved for 1 minute SUs charged
1 CPU 1
4 GB memory 1
1 A100 or A40 GPU 8
1 V100 or P100 GPU 4
1 K40 GPU 2

GPU-enabled software often requires the CUDA Toolkit or the cuDNN library. These are available as modules and can be found by running:

module spider cuda
module spider cudnn

There are multiple versions available. To load the modules, for example, run:

module purge
module load gcc/11.3.0
module load cuda/11.6.2
module load cudnn/

In addition, the NVIDIA HPC SDK with associated compilers, libraries, and related tools is available as a module:

module purge
module load nvhpc/23.11

If you require a different version of one of these modules that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.

0.0.3 Compiling CUDA programs

After a cuda or nvhpc module is loaded, you can then use the nvcc command to compile a CUDA C/C++ program:

nvcc -o program

Enter nvcc --help for more information on the available compiler options.

For the nvhpc module, in addition to nvcc, there are NVIDIA’s HPC compilers nvc, nvc++, and nvfortran. For example, to compile a CUDA Fortran program:

nvfortran program.cuf -o program

One advantage of these HPC compilers is that they provide GPU-acceleration of standard C++ and Fortran programs that are not explicitly written for GPUs.

0.0.4 Example Slurm job script

The following is an example Slurm job script for GPU jobs:


#SBATCH --account=<project_id>
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gpus-per-task=a40:1
#SBATCH --mem=16G
#SBATCH --time=1:00:00

module purge
module load nvhpc/23.11


Each line is described below:

Command or Slurm argument Meaning
#!/bin/bash Use Bash to execute this script
#SBATCH Syntax that allows Slurm to read your requests (ignored by Bash)
--account=<project_id> Charge compute resources used to <project_id>; enter myaccount to view your available project IDs
--partition=gpu Submit job to the gpu partition
--nodes=1 Use 1 compute node
--ntasks=1 Run 1 task (e.g., running a CUDA program)
--cpus-per-task=4 Reserve 4 CPUs for your exclusive use
--gpus-per-task=a40:1 Reserve 1 A40 GPU for your exclusive use
--mem=16G Reserve 16 GB of memory for your exclusive use
--time=1:00:00 Reserve resources described for 1 hour
module purge Clear environment modules
module load nvhpc/23.11 Load the nvhpc compilers and libraries environment module
./program Run program

Make sure to adjust the resources requested based on your needs, but keep in mind that requesting fewer resources should lead to less queue time for your job.

0.0.5 Additional resources

If you have questions about or need help, please submit a help ticket and we will assist you.