Conda is a package and environment manager primarily used for open-source data science packages for the Python and R programming languages. It also supports other programming languages like C, C++, FORTRAN, Java, Scala, Ruby, and Lua.
Using Conda on CARC systems
Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.
To use Conda, first load the corresponding module:
module purge module load conda
This module is based on the minimal Miniconda installer which includes the package and environment manager Conda that installs and updates packages and their dependencies. This module also provides Mamba, which is a drop-in replacement for most
conda commands that enables faster package solving, downloading, and installing.
The next step is to initialize your shell to use Conda and Mamba:
mamba init bash source ~/.bashrc
This modifies your
~/.bashrc file so that Conda and Mamba are ready to use every time you log in (without needing to load the module).
If you want a newer version of Conda or Mamba than what is available in the module, you can also install them into one of your directories. We recommend installing either Miniconda or Mambaforge.
Conda can also be configured with various options. Read more about Conda configuration here.
Integrated development environments
JupyterLab, VS Code, RStudio, and other integrated development environments (IDEs) can be used on compute nodes via our OnDemand service. To install Jupyter kernels, see our guide here.
Installing Conda environments and packages
You can create new Conda environments in one of your available directories. Conda environments are isolated project environments designed to manage distinct package requirements and dependencies for different projects. We recommend using the
mamba command for faster package solving, downloading, and installing, but you can also use the
The process for creating and using environments has a few basic steps:
- Create an environment with
- Activate the environment with
- Install packages into the environment with
To create a new Conda environment in your home directory, enter:
mamba create --name <env_name>
<env_name> is the name your want for your environment. Then activate the environment:
mamba activate <env_name>
Once activated, you can then install packages into that environment:
mamba install <pkg>
Please note that a version of the main application you are using (e.g., Python or R) is installed in the Conda environment, so the module versions of these should not be loaded when the Conda environment is activated. Enter
module purge to unload all loaded modules.
To deactivate an environment, enter:
You can also create a new environment in your project directory instead using the
--prefix option. For example:
mamba create --prefix /project/ttrojan_123/<env_name>
Then activate the environment:
mamba activate /project/ttrojan_123/<env_name>
To view a list of all your Conda environments, enter:
mamba env list
To remove a Conda environment, enter:
mamba env remove --name <env_name>
To document and reproduce a Conda environment, enter:
mamba env export > env.yml
This lists all installed packages in the environment and their versions in YAML format and saves the output to a file env.yml. This file can then be used to reproduce the environment if needed, using the following command:
mamba env create -f env.yml
Note that Conda creates unnecessary package cache files in your home directory. Enter
mamba clean --all to clear the cache and free up storage space.
Running Conda in interactive mode
A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g.,
endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.
To use your Conda environment interactively on a compute node, follow these two steps:
- Reserve job resources on a node using
- Once resources are allocated, activate your Conda environment and run the application
[user@discovery1 ~]$ salloc --time=1:00:00 --cpus-per-task=8 --mem=16G --account=<project_id> salloc: Pending job allocation 22658 salloc: job 22658 queued and waiting for resources salloc: job 22658 has been allocated resources salloc: Granted job allocation 22658 salloc: Waiting for resource configuration salloc: Nodes d11-35 are ready for job
Make sure to change the resource requests (the
--time=1:00:00 --cpus-per-task=8 --mem=16G --account=<project_id> part after your
salloc command) as needed, such as the number of cores and memory required. Also make sure to substitute your project ID; enter
myaccount to view your available project IDs.
Once you are granted the resources and logged in to a compute node, activate your environment and then enter the relevant command (e.g.,
[user@d11-35 ~]$ module purge [user@d11-35 ~]$ mamba activate myenv (myenv) [user@d11-35 ~]$ python Python 3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
Notice that the shell prompt changes from
user@<nodename> to indicate that you are now on a compute node (e.g.,
To exit the compute node and relinquish the job resources, enter
exit() to exit Python and then enter
exit in the shell. This will return you to the login node:
>>> exit() (myenv) [user@d11-35 ~]$ exit exit salloc: Relinquishing job allocation 22658 [user@discovery1 ~]$
Running Conda in batch mode
In order to submit jobs to the Slurm job scheduler, you will need to use the main application you are using with your Conda environment in batch mode. There are a few steps to follow:
- Create an application script
- Create a Slurm job script that runs the application script
- Submit the job script to the job scheduler using
Your application script should consist of the sequence of commands needed for your analysis.
A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job using Conda, a Slurm job script should look something like the following:
#!/bin/bash #SBATCH --account=<project_id> #SBATCH --partition=main #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=16G #SBATCH --time=1:00:00 module purge eval "$(conda shell.bash hook)" conda activate myenv python script.py
Each line is described below:
|Command or Slurm argument||Meaning|
|Use Bash to execute this script|
|Syntax that allows Slurm to read your requests (ignored by Bash)|
|Charge compute resources used to <project_id>; enter |
|Submit job to the main partition|
|Use 1 compute node|
|Run 1 task (e.g., running a Python script)|
|Reserve 8 CPUs for your exclusive use|
|Reserve 16 GB of memory for your exclusive use|
|Reserve resources described for 1 hour|
|Clear environment modules|
|Initialize the shell to use Conda|
|Activate your Conda environment|
Make sure to adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. Note that to fully utilize the resources, especially the number of CPUs, you may need to explicitly change your application code.
You can develop and edit application scripts and job scripts to run on CARC clusters in a few ways: on your local computer and then transfer the files to one of your directories on CARC file systems, with the Files app available on our OnDemand service, or with one of the available text editor modules (nano, micro, vim, or emacs).
Save the job script as
conda.job, for example, and then submit it to the job scheduler with Slurm's
[user@discovery1 ~]$ sbatch conda.job Submitted batch job 10002
To check the status of your job, enter
myqueue. If there is no job status listed, then this means the job has completed.
The results of the job will be logged and, by default, saved to a plain-text file of the form
slurm-<jobid>.out in the same directory where the job script was submitted from. To view the contents of this file, enter
less slurm-<jobid>.out, and then enter
q to exit the viewer.
For more information on running and monitoring jobs, see the Running Jobs guide.
If you have questions about or need help with Conda, please submit a help ticket and we will assist you.
Python user guide
R user guide