HPC with R

Last updated April 29, 2026

Table of Contents

1 Using R on CARC systems
2 Running R in interactive mode
3 Running R in batch mode
4 Installing R packages
- 4.1 Installing packages from Bioconductor
5 Parallel programming with R
6 Additional resources

R is an open-source programming environment and language designed for statistical computing and graphics.

1 Using R on CARC systems

Begin by logging in. Instructions for this are in the Getting Started with Discovery or Getting Started with Endeavour user guides.

You can use R in either interactive or batch modes. In either mode, first load a corresponding software module:

module purge
module load r/4.5.3

These R modules use a software container that includes the corresponding version of R as well as other dependencies needed for installing and using R packages.

To see all available versions of R, enter:

module spider r

1.1 Installing a different version of R

If you require a different version of R that is not currently installed, please submit a help ticket and we will install it for you. Alternatively, you could:

Build a custom Apptainer container with R installed.
Install R with Conda.

1.2 Startup files

You can customize the R startup process by creating ~/.Rprofile and ~/.Renviron files. See this guide.

1.3 RStudio

RStudio, as well as JupyterLab, VSCode, and other integrated development environments (IDEs), is available to use on compute nodes via our OnDemand service.

2 Running R in interactive mode

Using R on a login node should be reserved for installing packages. A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g., discovery.usc.edu or endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.

To run R interactively on a compute node, follow these two steps:

Reserve job resources on a node using Slurm’s salloc command
Once resources are allocated, load the required modules and enter R

[user@discovery1 ~]$ salloc --time=1:00:00 --cpus-per-task=8 --mem=16G --account=<project_id>
salloc: Pending job allocation 24316
salloc: job 24316 queued and waiting for resources
salloc: job 24316 has been allocated resources
salloc: Granted job allocation 24316
salloc: Waiting for resource configuration
salloc: Nodes d05-08 are ready for job

Change the resource requests (the --time=1:00:00 --cpus-per-task=8 --mem=16G --account=<project_id> part of your salloc command) as needed, such as to reflect the number of cores and memory required. Also substitute your project ID (<PI_username>_<id>); enter myaccount to view your available project IDs.

Once you are granted the resources and logged in to a compute node, load the required modules and then enter R:

[user@d05-08 ~]$ module load r/4.5.3
[user@d05-08 ~]$ R

R version 4.5.3 (2026-03-11) -- "Reassured Reassurer"
Copyright (C) 2026 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

The shell prompt changes from user@discovery1 to user@<nodename> to indicate that you are now on a compute node (e.g., d05-08).

To run R scripts from within R, use the source() function. Alternatively, to run R scripts from the shell, use the Rscript command.

To exit the node and relinquish the job resources, enter q() to exit R and then enter exit in the shell. This will return you to the login node:

> q()
[user@d05-08 ~]$ exit
exit
salloc: Relinquishing job allocation 24316
[user@discovery1 ~]$

3 Running R in batch mode

To submit jobs to the Slurm job scheduler, use R in batch mode:

Create an R script
Create a Slurm job script that runs the R script
Submit the job script to the job scheduler using sbatch

Your R script should consist of the sequence of R commands needed for your analysis. The Rscript command, available after the R module has been loaded, runs R scripts, and it can be used in the shell during an interactive job as well as in Slurm job scripts.

A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running R, a Slurm job script should look something like the following:

#!/bin/bash

#SBATCH --account=<project_id>
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=1:00:00

module purge
module load r/4.5.3

Rscript script.R

Each line is described below:

Command or Slurm argument	Meaning
`#!/bin/bash`	Use Bash to execute this script
`#SBATCH`	Syntax that allows Slurm to read your requests (ignored by Bash)
`--account=<project_id>`	Charge compute resources used to <project_id>; enter `myaccount` to view your available project IDs
`--partition=main`	Submit job to the main partition
`--nodes=1`	Use 1 compute node
`--ntasks=1`	Run 1 task (e.g., running an R script)
`--cpus-per-task=8`	Reserve 8 CPUs for your exclusive use
`--mem=16G`	Reserve 16 GB of memory for your exclusive use
`--time=1:00:00`	Reserve resources described for 1 hour
`module purge`	Clear environment modules
`module load r/4.5.3`	Load the `r` environment module
`Rscript script.R`	Use `Rscript` to run `script.R`

Adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. To fully utilize the resources, especially the number of CPUs, you may need to explicitly change your R code (see the section on parallel programming below).

Develop and edit R scripts and job scripts to run on CARC clusters:

on your local computer and then transfer the files to one of your directories on CARC file systems.
with the Files app available on our OnDemand service.
with one of the available text editor modules (nano, micro, vim, or emacs).

Save the job script as R.job, for example, and then submit it to the job scheduler with Slurm’s sbatch command:

[user@discovery1 ~]$ sbatch R.job
Submitted batch job 170554

To check the status of your job, enter jobqueue --me. If there is no job status listed, then this means the job has completed.

The results of the job will be logged and, by default, saved to a plain-text file of the form slurm-<jobid>.out in the same directory where the job script was submitted from. To view the contents of this file, enter less slurm-<jobid>.out, and then enter q to exit the viewer.

For more information on running and monitoring jobs, see the Running Jobs guide.

4 Installing R packages

To install R packages, open an interactive session of R. First create a personal library if you have not already done so:

dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE)
.libPaths(Sys.getenv("R_LIBS_USER"))

By default, this will create a personal library in your home directory (for example, ~/R/x86_64-pc-linux-gnu-library/4.5).

Then use the install.packages() function to install packages registered on CRAN. For example, to install the skimr package, enter:

install.packages("skimr")

To load an R package, use the library() function. For example:

library(skimr)

You can also install packages to a different location. Using your project directory is useful for project libraries shared among your research team. The best way to install and load packages to other locations is by setting the environment variable R_LIBS_USER. You can create this variable in the shell and set it to the path of the library location. For example:

export R_LIBS_USER=/project/ttrojan_123/R/pkgs/4.4

R will then use that path as your default library instead of the one in your home directory. When you load and use R, you can use the install.packages() and library() functions normally, but the packages will be installed to and loaded from the R_LIBS_USER location. You can add this line to your ~/.bashrc to automatically set the R_LIBS_USER variable every time you log in. When using a different location, make sure to have separate package libraries for different versions of R. Alternatively, you can set this variable in an ~/.Renviron file. To check your library locations within an R session, use the .libPaths() function.

For project libraries, also consider using the renv package to create reproducible, project-specific R environments. See more information here.

To install unregistered or development versions of packages, such as from GitHub repos, use the remotes package and its functions. For example:

remotes::install_github("USCbiostats/slurmR")

4.1 Installing packages from Bioconductor

Install packages from Bioconductor using the BiocManager package and the BiocManager::install() function. For example, to install the GenomicFeatures package, enter:

install.packages("BiocManager")
BiocManager::install("GenomicFeatures")

See more information about BiocManager here.

5 Parallel programming with R

R is a serial (i.e., single-core/single-threaded) programming language by default, but with additional libraries and packages it also supports parallel programming to enable full use of multi-core processors and compute nodes. This includes the use of shared memory on a single node or distributed memory on multiple nodes. On CARC systems, 1 thread = 1 core = 1 logical CPU (requested with Slurm’s --cpus-per-task option).

Parallelizing your code to use multiple cores or nodes can reduce the runtime of your R jobs, but the speedup does not necessarily increase in a proportional manner. The speedup depends on the scale and types of computations that are involved. Furthermore, sometimes using a single core is optimal. There is a cost to setting up parallel computation (e.g., modifying code, communications overhead, etc.), and that cost may be greater than the achieved speedup, if any, of the parallelized version of the code. Some experimentation will be needed to optimize your code and resource requests (optimal number of cores and amount of memory). Also keep in mind that your project account will be charged based on the cores reserved for a job, even if all those cores are not actually used during the job.

The main R packages for parallelism are summarized as follows:

Package	Purpose
parallel	Primarily for iteration
parallelly	Enhanced version of parallel package
data.table	For speed and memory efficiency with data frames
foreach	For iteration, with parallel backend (e.g., `doParallel`)
pbdMPI	For general multi-node computing
BiocParallel	For parallel computing with Bioconductor objects
future	For asynchronous evaluations (futures)
targets	For defining and running workflows
rslurm	For submitting Slurm jobs from within R (e.g., for iteration)
slurmR	For submitting Slurm jobs from within R (e.g., for iteration)
clustermq	For submitting Slurm jobs from within R (e.g., for iteration)

Please review the linked documentation above for examples and more information about how to use these packages and their functions.

For more information about high-performance computing with R, see our workshop materials for HPC with R as well as the resources linked below.

6 Additional resources

If you have questions about or need help with R, please submit a help ticket and we will assist you.

Tutorials:

Web books:

CARC R workshop materials:

HPC with R