Parallel HDF5 for Python

Hierarchical Data Format version 5 (HDF5) is an open-source file format that supports large, complex, and heterogeneous datasets. Reading and writing large files from/to storage systems can be very time-consuming. HDF5 is designed to allow for efficient I/O processing and uses a file directory-like structure to allow for flexible data organization. More information can be found on the HDF5 support pages.

This library is available on CARC clusters after loading gcc and MPI modules:

module load gcc/11.3.0   mvapich2/2.3.7   hdf5/1.12.2 

H5py provides the Python interface to read and load binary HDF5 files. More information about this interface can be found on the h5py website. In order to install this interface on Discovery or Endeavour, follow the steps below.

Note: MPI4py and h5py must be built with the same set of compilers in order to be compatible.

  1. Load the modules
module purge
module load gcc/11.3.0   mvapich2/2.3.7   hdf5/1.12.2   cmake/3.23.2   conda
  1. Create a Conda environment
conda create -n phdf5 -y
conda activate phdf5
  1. Specify the compilers
export CC=`which mpicc`
export CXX=`which mpicxx`
export FC=`which mpifort`
  1. Install Cython
pip install cython
  1. Specify the HDF5 installation directory, MPI Path, and HDF5_MPI flag
export MPI_INCLUDE_PATH=${MPI_ROOT}/include
export MPI_LIB_PATH=${MPI_ROOT}/lib
export HDF5_DIR=$HDF5_ROOT
export HDF5_MPI="ON"
  1. Install MPI4py
pip install mpi4py --no-binary=mpi4py --no-build-isolation --no-deps --verbose --user --install-option="--hdf5=$HDF5_DIR" --install-option="-I$MPI_INCLUDE_PATH" --install-option="-L$MPI_LIB_PATH"
  1. Install H5py
pip install h5py --no-binary=h5py --no-binary=mpi4py --no-build-isolation --no-deps --verbose --user --install-option="--hdf5=$HDF5_DIR" --install-option="--mpi-include=$MPI_INCLUDE_PATH" --install-option="--mpi-libraries=$MPI_LIB_PATH"

If you encounter any issues installing h5py and MPI4py, please submit a ticket on the User Portal or send an email to

Back to top