Using GPUs

Some programs can take advantage of the unique hardware architecture in a graphics processing unit (GPU). GPUs can be used for specialized scientific computing work, including 3D modelling and machine learning. CARC's Discovery cluster offers a few different models of GPUs for use with your jobs. In addition, Condo Cluster Program users participating in the traditional purchase model have the option to include GPUs in their dedicated resources.

Requesting GPU resources

On Discovery, most GPU nodes are available on the gpu partition. Some GPU nodes are also available on the main and debug paritions. Enter the nodeinfo command for more information.

To request a GPU on the gpu partition, first add the following line to your Slurm job script:

#SBATCH --partition=gpu

Or similarly, use the main or debug partition. Also add one of the following options to your Slurm job script to request the type and number of GPUs you would like to use:

#SBATCH --gres=gpu:<number>


#SBATCH --gres=gpu:<gpu_type>:<number>


<number> is the number of GPUs per node requested, and
<gpu_type> is a GPU model

For Discovery nodes, use the chart below to determine which GPU type to specify:

GPU typeGPU modelMax number of GPUs per node
a100NVIDIA Tesla A1002
a40NVIDIA Tesla A402
v100NVIDIA Tesla V1002
p100NVIDIA Tesla P1002
k40NVIDIA Tesla K402

On Endeavour, there may be different GPU types or more than 2 GPUs per node, depending on what the condo group has purchased.

For interactive jobs, use similar options with the salloc command:

salloc --partition=gpu --gres=gpu:<gpu_type>:<number>

To see a list of currently available GPUs, enter a command like the following:

nodeinfo -s idle,mix | grep gpu

Each GPU device is also assigned a certain number of CPUs it can interact with, so the --cpus-per-task option must meet this constraint. With 2 GPUs per node, this typically means that the maximum number of CPUs that can be used per GPU is half of the total number of CPUs on a node. For example, on a node with 2 GPUs and 20 CPUs, when requesting 1 GPU the maximum number of CPUs that can be used is 10.

The maximum number of GPUs that can be used at one time per user, in one job or across multiple jobs, is 36.

System Unit (SU) charges

Each job will subtract from your project's allocated System Units (SUs) depending on the types of resources you request:

Resource reserved for 1 minuteSUs charged
1 CPU1
4 GB memory1
1 A100 or A40 GPU8
1 V100 or P100 GPU4
1 K40 GPU2

Loading GPU-related modules

GPU-enabled software often requires the CUDA Toolkit or the cuDNN library. These are available as modules and can be found by running:

module spider cuda
module spider cudnn

Or to search for modules that contain 'cud' in the name, run:

module spider cud

There are multiple versions available. To load the modules, for example, run:

module load gcc/11.3.0
module load cuda/11.6.2
module load cudnn/

In addition, the NVIDIA HPC SDK with associated compilers, libraries, and related tools is available as a module:

module load nvhpc/22.11

If you require a different version of one of these modules that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.

Compiling CUDA programs

After a cuda or nvhpc module is loaded, you can then use the nvcc command to compile a CUDA C/C++ program:

nvcc -o program

Enter nvcc --help for more information on the available compiler options.

For the nvhpc module, in addition to nvcc, there are NVIDIA's HPC compilers nvc, nvc++, and nvfortran. For example, to compile a CUDA Fortran program:

nvfortran program.cuf -o program

One advantage of these HPC compilers is that they provide GPU-acceleration of standard C++ and Fortran programs that are not explicitly written for GPUs.

Example Slurm job script

The following is an example Slurm job script for GPU jobs:


#SBATCH --account=<project_id>
#SBATCH --partition=gpu
#SBATCH --gres=gpu:a40:1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16GB
#SBATCH --time=1:00:00

module purge
module load nvhpc/22.11


Each line is described below:

Command or Slurm argumentMeaning
#!/bin/bashUse Bash to execute this script
#SBATCHSyntax that allows Slurm to read your requests (ignored by Bash)
--account=<project_id>Charge compute resources used to <project_id>; enter myaccount to view your available project IDs
--partition=gpuSubmit job to the gpu partition
--gres=gpu:a40:1Reserve 1 A40 GPU
--nodes=1Use 1 compute node
--ntasks=1Run 1 task (e.g., running a CUDA program)
--cpus-per-task=4Reserve 4 CPUs for your exclusive use
--mem=16GBReserve 16 GB of memory for your exclusive use
--time=1:00:00Reserve resources described for 1 hour
module purgeClear environment modules
module load nvhpc/22.11Load the nvhpc compilers and libraries environment module
./programRun program

Make sure to adjust the resources requested based on your needs, but keep in mind that requesting fewer resources should lead to less queue time for your job.

Additional resources

CUDA Toolkit

Back to top