File input/output (I/O) refers to reading and writing data. The following guide offers advice on managing file I/O for your compute jobs.
Some I/O best practices:
- Try to avoid disk I/O, especially for workflows that create a large number of files. Process data in memory when possible instead of writing to and reading from the disk. This will provide the best performance, though the size of the data and subsequent memory requirements may place limits on this strategy.
- Use the local /tmp directory—the default location—on compute nodes for small-scale I/O, a RAM-based file system (tmpfs). Files are saved in memory, allowing for better performance than saving to the disk.
Note: the /tmp directory is limited to 1 GB of space and is shared among jobs running on the same node. The files are removed when the job ends. The size of the files and job memory requirements may place limits on this strategy.
- Use the local /dev/shm directory on compute nodes for large-scale I/O, which is also a RAM-based file system (tmpfs). The space is limited by the memory requested for your job and includes a hard limit of half the total memory available on a node. Like /tmp, files are saved in memory for better performance than saving to the disk.
Note: files are removed when the job ends. The size of the files and job memory requirements may also place limits on this strategy.
- Use your /scratch1 directory for disk I/O when needed, which is located on a high-performance, parallel file system.
- Use high-level I/O libraries and file formats like HDF5 or NetCDF. These enable fast I/O through a single file format and parallel operations. The file formats are also portable across computing systems.
Redirecting temporary files
The default value of the environment variable
TMPDIR for compute jobs will look like
/tmp/SLURM_<job_id>. To automatically redirect temporary files from this /tmp location to another location, change the
TMPDIR variable. For example, create a
tmp subdirectory in your /scratch1 directory and then enter the following:
Include this line in job scripts to set the
TMPDIR for batch jobs.
Some jobs may require staging data in and out of temporary directories, such as when using the tmpfs file systems /tmp or /dev/shm.
Beginning of job
You may need to stage data at the beginning of a job to a temporary directory, like extracting a large number of input files. When using /dev/shm, for example, enter a sequence of commands like the following:
mkdir /dev/shm/$SLURM_JOB_ID tar -C /dev/shm/$SLURM_JOB_ID -xf /scratch1/ttrojan/input.tar.gz
This example assumes that the input files have been previously bundled in a
tar archive file.
End of job
If you want to keep temporary output files from a job, you may need to copy them to persistent storage. When using /dev/shm, for example, enter a command like the following:
tar -czf /scratch1/ttrojan/output.tar.gz /dev/shm/$SLURM_JOB_ID
This example bundles the temporary files in a
tar archive file and saves it to the /scratch1 file system.