Aurora Exascale Early Science Program

Last updated July 27, 2022

Table of Contents

1 Aurora Exascale Early Science Program Project

0.0.0.1 Enabling code with performance portable GPU acceleration without interrupting scientific production.

1 Aurora Exascale Early Science Program Project

Collaborators: University of Southern California (Aiichiro Nakano and Ken-ichi Nomura), USC CARC (Marco Olguin), Argonne National Library (Timothy Williams and Ye Luo), and Intel

The arrival of an Intel graphics processing unit (GPU)-based Exascale High Performance Computing (HPC) system at the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy Office of Science User Facility at Argonne National Laboratory, requires researcher teams to adapt central processing unit (CPU)-optimized codes for the new architecture.

QXMD, a Fortran-based scalable quantum molecular dynamics code, was selected for the Aurora Early Science Program (ESP) via the “Metascalable Layered Materials Genome” project led by Aiichiro Nakano and Ken-ichi Nomura of the University of Southern California (USC). The project prepares large-scale materials science simulations to run efficiently on Aurora, one of the world’s first exascale supercomputers. Capable of over one quintillion calculations per second, Aurora delivers unprecedented computational power for simulations, AI, and data analysis. Built in partnership with Intel and Hewlett Packard Enterprise, Aurora accelerates discoveries in critical fields such as climate and materials science, energy storage, and fusion energy.

QXMD simulations explore nonadiabatic quantum molecular dynamics at the nexus of physics, chemistry, and materials science—an area that is extremely computationally expensive. With no prior GPU version of the code, the Aurora ESP project team implemented performance-portable GPU acceleration using OpenMP. Though well-established for CPU parallelization, OpenMP introduced GPU offloading in API 4.0 (2013) and has since continued to expand these features.

After assessing the application’s computational profile, the development team enhanced code abstraction by reorganizing the most computationally intensive components into modular internal units for independent development and validation. As part of this effort, the team created a C++ mini-application, Local Field Dynamics (LFD), to compute many-electron dynamics using real-time time-dependent density functional theory, one of QXMD’s most computationally expensive kernels. LFD, designed as a plugin, exploit sthe separability of the code, allowing for independent configuration and testing and leveraging synthetic test values to run real computational routines. Once the correct mathematical environment was established within LFD, the team shifted focus to porting it to GPU.

Before the Intel compiler for GPU offload was available, OpenMP GPU offload capability for LFD was developed by using the IBM XL and LLVM Clang compilers on NVIDIA GPUs to enable successful portability between different platforms. With this setup, the team could use their code to configure Intel software in terms of both capability and performance on Intel-integrated and discrete GPUs.

Once Intel compilers became available, the team began validating it with progressive complexity, keeping several versions of the code. Earlier ones are simpler, with fewer OpenMP GPU offload features used, and demand less of the compiler; later versions integrate more offload regions and advanced OpenMP features to make more challenging demands of the compiler. This allows them to work by solving smaller problems in a piecemeal fashion while also retaining the ability to stress-test the compiler. Meanwhile, the full QXMD program is used to validate the Intel Fortran compiler on CPU systems.

The collaboration between the Center for Advanced Research Computing (CARC) and Aiichiro Nakano and Ken-ichi Nomura from the Center for Computing and Simulations (CACS) at USC extended to the Aurora ESP project. Code development and testing work was performed on CARC HPC resources (Discovery) in porting the QXMD code to GPUs using OpenMP offloading. In early 2020, CARC underwent a full system upgrade, requiring a new software stack built with Spack. Initially only supporting the GCC compiler suite, the Aurora ESP collaboration drove the addition of OpenMP GPU-offloading compilers, enabling development and performance testing on CARC’s NVIDIA GPUs. The introduction of Argonne-Intel-USC co-design computing technologies to CARC resources benefits the USC research computing community at large.

Compiler support for OpenMP GPU offloading continues to improve, with each suite offering varying API coverage and optimizations. The ESP team tests major compilers, including NVIDIA HPC SDK (nvfortran, nvc++), LLVM (flang, clang++), and GCC (gfortran, g++), alongside Intel OneAPI, Aurora’s promoted suite. This collaboration positioned CARC as an early adopter of these modern compilers via its software stack, expanding its compiler offerings beyond GCC to include Intel OneAPI, NVIDIA HPC SDK, and LLVM—now widely adopted in the HPC community where their use has increased dramatically over the past few years. At USC, the NVIDIA HPC SDK is heavily used for GPU applications, and researchers have transitioned from Intel’s “classic” compilers to Intel OneAPI and LLVM with CARC’s support. As a result, CARC has developed expertise in OpenMP GPU offloading, achieving GPU performance comparable to CUDA.