OpenMP (Open Multi-Processing) is a popular Application Programming Interface (API) for multi-threaded applications. It supports shared memory, multi-processing programming in C, C++, and Fortran on most platforms, instruction set architectures, and operating systems. The API includes compiler directives and constructs, runtime library routines, and environment variables for thread creation and management.
OpenMP is an explicit (i.e., not automatic) programming model offering the programmer full control over parallelization. Parallelization can be as simple as taking a serial program and inserting parallel compiler directives. OpenMP programs accomplish parallelism exclusively through the use of threads—the smallest unit of processing that can be scheduled by an operating system. Typically, the number of threads used matches the number of physical CPU cores, known as one-to-one mapping. However, the optimal number of threads depends on the specific application.
OpenMP uses the fork-join model of parallel execution:
- FORK: The master thread creates a team of parallel threads with access to shared memory. The statements in the program that are enclosed in the parallel region are then executed in parallel among the team of threads.
- JOIN: When the team of threads complete the statements in the parallel region, they synchronize and terminate, leaving only the master thread.
Compiling OpenMP programs
OpenMP programs are compatible with most compilers. The following table lists the compilers available on CARC HPC clusters and their corresponding compilation command and option for OpenMP programs:
|Compiler family||Module name||Language||Compilation command|
|GCC||gcc||C||gcc -fopenmp [...]|
|C++||g++ -fopenmp [...]|
|Fortran||gfortran -fopenmp [...]|
|LLVM||llvm||C||clang -fopenmp [...]|
|C++||clang++ -fopenmp [...]|
|AOCC||aocc||C||clang -fopenmp [...]|
|C++||clang++ -fopenmp [...]|
|Intel||intel-oneapi||C||icx -qopenmp [...]|
|C++||icpx -qopenmp [...]|
|Fortran||ifx -qopenmp [...]|
|NVHPC||nvhpc||C||nvc -mp [...]|
|C++||nvc++ -mp [...]|
|Fortran||nvfortran -mp [...]|
For example, to use the gcc compiler to compile a C program using OpenMP, enter in the following:
module purge module load gcc/11.3.0 gcc -fopenmp omp_program.c -o omp_program
Offloading to GPUs
Parallel regions of programs can also be offloaded to GPUs using OpenMP via the target directive. However, the regions should have substantial parallelism and be structured well with little thread synchronization in order for there to be a noticable increase in the speed of executing your program.
Offloading to GPUs also requires additional compiler flags. For example, using the nvc compiler for a C program using OpenMP:
module purge module load nvhpc/22.11 nvc -mp=gpu -gpu=cc70 omp_program.c -o omp_program
In this example the target is a V100 GPU (e.g., cc70).
Consult specific compiler documentation for more information on offloading to GPUs.
Running OpenMP programs
Once the program has been compiled, the next step to running OpenMP programs is to set the
OMP_NUM_THREADS environment variable indicating the number of threads to use. Typically, this should be set to match the number of CPU cores that you have requested for your Slurm job.
A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job launching OpenMP parallel programs, a Slurm job script should look similar to the following:
#!/bin/bash #SBATCH --account=<project_id> #SBATCH --partition=main #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 #SBATCH --mem=32G #SBATCH --time=1:00:00 module purge module load gcc/11.3.0 ulimit -s unlimited export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./omp_program
Each line is described below:
|Command or Slurm argument||Meaning|
|Use Bash to execute this script|
|Syntax that allows Slurm to read your requests (ignored by Bash)|
|Charge compute resources to <project_id>; enter |
|Submit job to the main partition|
|Use 1 compute node|
|Run 1 task (e.g., running an OpenMP program)|
|Reserve 16 CPUs for your exclusive use|
|Reserve 32 GB of memory for your exclusive use|
|Reserve resources described for 1 hour|
|Clear environment modules|
|Load the |
|Set the limit of user stack size to unlimited|
|Set the number of threads to parallelize over. This uses the Slurm-provided environment variable |
|Run your OpenMP program|
OpenMP includes thread affinity options that allow binding threads to specific places on a compute node. This may improve the performance of your program, though the optimal values to use depend on your specific application.
Use the environment variables
OMP_PROC_BIND to set thread affinity at runtime. The following values are a good starting point:
export OMP_PLACES=cores export OMP_PROC_BIND=spread
We recommend experimenting and benchmarking to find the optimal binding strategy for your application. Consult the OpenMP documentation for more information on thread affinity and the available options.
If you have questions about or need help with OpenMP or parallel programming, please submit a help ticket and we will assist you.
For hybrid MPI/OpenMP programs, see our MPI guide.