Some programs can take advantage of the unique hardware architecture in a graphics processing unit (GPU). GPUs can be used for specialized scientific computing work, including 3D modelling and machine learning. CARC's Discovery cluster offers a few different models of GPUs for use with your jobs. In addition, Condo Cluster Program users participating in the traditional purchase model have the option to include GPUs in their dedicated resources.
Requesting GPU resources
On Discovery, most GPU nodes are available on the gpu partition. Some GPU nodes are also available on the main and debug paritions. Enter the
nodeinfo command for more information.
To request a GPU on the gpu partition, first add the following line to your Slurm job script:
Or similarly, use the main or debug partition. Also add one of the following options to your Slurm job script to request the type and number of GPUs you would like to use:
<number> is the number of GPUs per node requested, and
<gpu_type> is a GPU model
For Discovery nodes, use the chart below to determine which GPU type to specify:
|GPU type||GPU model||Max number of GPUs per node|
|a100||NVIDIA Tesla A100||2|
|a40||NVIDIA Tesla A40||2|
|v100||NVIDIA Tesla V100||2|
|p100||NVIDIA Tesla P100||2|
|k40||NVIDIA Tesla K40||2|
On Endeavour, there may be different GPU types or more than 2 GPUs per node, depending on what the condo group has purchased.
For interactive jobs, use similar options with the
salloc --partition=gpu --gres=gpu:<gpu_type>:<number>
To see a list of currently available GPUs, enter a command like the following:
nodeinfo -s idle,mix | grep gpu
Each GPU device is also assigned a certain number of CPUs it can interact with, so the
--cpus-per-task option must meet this constraint. With 2 GPUs per node, this typically means that the maximum number of CPUs that can be used per GPU is half of the total number of CPUs on a node. For example, on a node with 2 GPUs and 20 CPUs, when requesting 1 GPU the maximum number of CPUs that can be used is 10.
The maximum number of GPUs that can be used at one time per user, in one job or across multiple jobs, is 36.
System Unit (SU) charges
Each job will subtract from your project's allocated System Units (SUs) depending on the types of resources you request:
|Resource reserved for 1 minute||SUs charged|
|4 GB memory||1|
|1 A100 or A40 GPU||8|
|1 V100 or P100 GPU||4|
|1 K40 GPU||2|
Loading GPU-related modules
GPU-enabled software often requires the CUDA Toolkit or the cuDNN library. These are available as modules and can be found by running:
module spider cuda module spider cudnn
Or to search for modules that contain 'cud' in the name, run:
module spider cud
There are multiple versions available. To load the modules, for example, run:
module load gcc/11.3.0 module load cuda/11.6.2 module load cudnn/184.108.40.206-11.6
In addition, the NVIDIA HPC SDK with associated compilers, libraries, and related tools is available as a module:
module load nvhpc/22.11
If you require a different version of one of these modules that is not currently installed on CARC systems, please submit a help ticket and we will install it for you.
Compiling CUDA programs
nvhpc module is loaded, you can then use the
nvcc command to compile a CUDA C/C++ program:
nvcc program.cu -o program
nvcc --help for more information on the available compiler options.
nvhpc module, in addition to
nvcc, there are NVIDIA's HPC compilers
nvfortran. For example, to compile a CUDA Fortran program:
nvfortran program.cuf -o program
One advantage of these HPC compilers is that they provide GPU-acceleration of standard C++ and Fortran programs that are not explicitly written for GPUs.
Example Slurm job script
The following is an example Slurm job script for GPU jobs:
#!/bin/bash #SBATCH --account=<project_id> #SBATCH --partition=gpu #SBATCH --gres=gpu:a40:1 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=16GB #SBATCH --time=1:00:00 module purge module load nvhpc/22.11 ./program
Each line is described below:
|Command or Slurm argument||Meaning|
|Use Bash to execute this script|
|Syntax that allows Slurm to read your requests (ignored by Bash)|
|Charge compute resources used to <project_id>; enter |
|Submit job to the gpu partition|
|Reserve 1 A40 GPU|
|Use 1 compute node|
|Run 1 task (e.g., running a CUDA program)|
|Reserve 4 CPUs for your exclusive use|
|Reserve 16 GB of memory for your exclusive use|
|Reserve resources described for 1 hour|
|Clear environment modules|
|Load the |
Make sure to adjust the resources requested based on your needs, but keep in mind that requesting fewer resources should lead to less queue time for your job.