Frequently Asked Questions

Last updated March 11, 2024
Table of Contents

Find answers to the most common questions and concerns here.

0.1 Accounts

0.1.1 How can I apply for a CARC account?

You can log in to CARC systems using your USC NetID and password, so there is no additional requirement for CARC-specific account creation. However, in order to access CARC systems, you must either be the Principal Investigator (PI) of a research project or an authorized member of a PI’s research project. For more information on CARC accounts, please see the Project and Allocation Management pages.

0.1.2 I forgot my password. How can I reset it?

Because your CARC account is accessed using your USC NetID, CARC does not have access to passwords and cannot reset them. You can reset your USC NetID password here or contact the ITS Customer Support Center for assistance with resetting your account password.

0.1.3 What is a quota?

A quota can refer to any one of the following:

  • Disk quota: The maximum allowed disk space available to you in your home, project, or scratch directories.
  • File quota: The maximum number of files you can store in your home, project, or scratch directories.
  • Compute time quota: The maximum number of system units available to you for running jobs.

Every project is configured with default quotas. Disk and file quotas for user home directories are permanent. Project Principal Investigators (PIs) may submit a request to increase their project, file, and compute time quotas through the CARC user portal.

You can check the quota on your directories by running the myquota command while logged in to a login node.

0.1.4 How do I request more compute time and/or disk space for my project?

If you run out of compute time or project disk space, a PI can request an increase in the CARC user portal.

If you’re requesting a new storage allocation and require more than 10 TB, you can request your allocation in the user portal and indicate the amount of storage you need (in 5 TB increments). See the Request a New Allocation user guide for more information.

If you have an existing storage allocation but would like to increase the amount of storage, please submit a help ticket under the “Accounts/Access” category. Please include your project ID, desired allocation size, and reason for this increase. The CARC team will consult with you to determine your needs and the total cost.

0.1.5 How do I add someone to my project?

A PI can add users in the CARC user portal. More information can be found in the Managing Users on Projects user guide.

0.1.6 How long will I be able to use my CARC account? Will my account access need to be renewed?

If you are a member of a CARC project, the PI for that project can remove you at any time.

Before the end of each fiscal year, a PI must review their project to keep it active and renew their allocations associated with it. If a project is not reviewed, allocations will expire mid-July and login access will expire around August. All members of non-renewed projects will be removed in approximately mid-October. For more information, see the Annual Project Review user guide.

0.1.7 Will my account remain active if I leave USC?

If you are no longer working on a CARC project, your account will be closed near the beginning of the semester following your departure from the university. If you are a member of a project that uses sensitive/secure data, your account will be closed once you are no longer active with the university. Your data will not be deleted until your project’s PI requests that it be deleted.

If you wish to continue working with someone on a project at USC after you leave, it is possible to keep your account active. You will need to register for a USC guest account through iVIP with the support of the PI.

0.1.8 My project collaborator is not at USC. Can they apply for an account?

A PI may request a USC guest account for a collaborator outside of the university through the iVIP system. Once an iVIP account has been created, the PI may then add the collaborator’s iVIP account to their CARC project in the same way they would normally add a user in the CARC user portal.

0.2 Cluster Resources: General Questions

0.2.1 How do I log in to the Discovery cluster?

To log in to the Linux-based cluster, first you will need to connect to the USC secure network. If you are on campus, you will need to connect to an ethernet port or the USC Secure Wireless network. If you are off campus, you will need to connect to the USC VPN. See instructions here.

Once connected to the USC secure network, you can access Discovery either via CARC OnDemand or via a terminal app.

Using a terminal app, you will need to use ssh to access one of Discovery’s login nodes, where <username> is your USC NetID:

ssh <username>

The login nodes should only be used for non-intensive work like editing and compiling programs; any computing should be done on the compute nodes. Computing jobs run on the login nodes may be terminated before they complete. To submit jobs to the compute nodes, use the Slurm resource manager.

For more information on logging in to the cluster, see the Getting Started with Discovery user guide or Getting Started with Endeavour user guide.

0.2.2 How do I avoid getting logged out of CARC systems due to a bad Wi-Fi connection?

CARC systems will log you out of a login node after 20 minutes of inactivity, but sometimes you can be logged out due to an unstable Wi-Fi connection. You can modify your SSH configuration to minimize disconnect and login issues for CARC systems. Adding the following lines to the ~/.ssh/config file on your local computer may help:

Host *
  ServerAliveInterval 30
  ServerAliveCountMax 4
  ControlMaster auto
  ControlPath ~/.ssh/%r@%h:%p
  ControlPersist 300s

You will need to create this file if one does not already exist.

0.2.3 Why can’t I log in to the Discovery or Endeavour clusters using VSCode?

When using VSCode, accessing the cluster via the Remote SSH extension may be blocked. This extension spawns too many processes on the login nodes, exceeding the process limit. Additionally, the processes started by Remote SSH are not properly killed after the user logs out of the application, which may lead to an account hold preventing the user from accessing the cluster, even from the terminal. It is recommended to use the SSH-FS extension in VSCode instead. These measures are set in place to prevent the login nodes, as shared resources, from becoming saturated and sluggish.

To read more about process limits on login nodes, see the Running Jobs on CARC Systems page.

0.2.4 What shell am I using? Can I use a different shell?

The default shell for new accounts is Bash. You can check what your current shell is by entering echo $0 when logged in:

[ttrojan@discovery1 ~]$ echo $0

If you would like to change the shell you are using, you can enter bash or csh to temporarily use a new shell. If you would like to permanently change your default shell, add lines like the following to your ~/.bash_profile. For example, to change to the C shell:

export SHELL=/usr/bin/csh
exec /usr/bin/csh -l

0.3 Cluster Resources: Running Jobs

0.3.1 How do I run jobs on the cluster?

Jobs can be run on the cluster in batch mode or in interactive mode. Batch mode processing is performed remotely and without manual intervention. Interactive mode enables you to test your program and environment setup interactively using the salloc command.

Once your job is running interactively as expected, you should then submit it for batch processing. This is done by creating a simple text file, called a Slurm job script, that specifies the cluster resources you need and the commands necessary to run your program.

For details and examples on how to run jobs, see the Running Jobs user guide.

0.3.2 How can I tell when my job will run?

After submitting a job to the queue, you can use the command:

squeue -j <job_id> --start

where <job_id> is the reference number the Slurm job scheduler uses to keep track of your job. The squeue command will give you an estimate based on historical usage and availability of resources. Please note that there is no way to know in advance what the exact wait time will be, and the expected start time may change over time.

0.3.3 How can I tell if my job is running?

You can check the status of your job using the myqueue command. If your job is running but you are still unsure if your program is working, you can ssh into your compute nodes and use the command top to see what is running.

In general, it is recommended that users first request an interactive session to test out their jobs. This will give you immediate feedback if there are errors in your program or syntax. Once you are confident that your job can complete without your intervention, you are ready to submit a batch job using a Slurm script.

0.3.4 Why are my jobs in queue when there are idle compute nodes available?

Your job is in queue because there are other jobs in queue with a higher priority or the available nodes do not have the resources your job is requesting.

Slurm provides reason codes to explain why a job is pending. If the reason code Slurm provides is “Resources”, it means that a job will start as soon as the requested resources become available. These jobs typically have the highest priority.

If the reason code is “Priority”, it means that there are other jobs in the queue with higher priority.

Note: Sometimes you’ll see QOSMaxJobsPerUserLimit or QOSMaxCPUsPerUserLimit meaning that a user has hit some kind of policy limitation set by CARC.

  1. Priority: Priority is determined by two main factors. The first is age, or how long the job has been waiting in queue. The second is fairshare. This is a measure of how much you or your project members have used the cluster lately. Everyone starts out with a certain amount of fairshare “points” and the more resources you consume, the lower your fairshare score goes. Your fairshare score will replenish over time.

You can examine the priority of your job with the command sprio -j <job_id>.

  1. Resources: Even if there are jobs pending, you may notice that some nodes are idle (i.e. they have no jobs running on them). This can happen for two reasons.

The first is the idle compute nodes are not able to satisfy the resource requests for any of the currently pending jobs. For example, if everyone is requesting 128GB of memory, a 64GB memory compute node will not be able to accept those jobs.

The second is when a job with high priority requests multiple compute nodes. Usually this job will show as pending due to “Resources”. For example, if a user with high priority requests two compute nodes with 80GB memory and one is currently idle, Slurm will hold that node until another node with 80GB becomes available. Since Slurm knows when jobs are expected to end, it will only run a job on that idle node if the job can complete before the second one becomes available.

0.3.5 How do I get my job to run faster?

There are a couple options for getting your jobs in and out of the queue faster.

  1. Determine the minimum resources required for your job. It’s easier for Slurm to schedule a job that uses 1 CPU and a few GB of memory than it is to request multiple compute nodes. Requesting unneeded resources will also drain your fairshare score (see above) which will lower your priority in the queue.

  2. Request a short run time. Slurm will allow you to “cut” in line if your job can finish on resources reserved for a job that will start soon.

0.3.6 How do I tell if my job is running on multiple cores?

You can check the resources your program is consuming using the top process manager:

  1. Request an interactive compute node using the salloc command:
 [ttrojan@discovery1 ~]$ salloc --ntasks=8
 salloc: Pending job allocation 24210
 salloc: job 24210 queued and waiting for resources
 salloc: job 24210 has been allocated resources
 salloc: Granted job allocation 24210
 salloc: Waiting for resource configuration
 salloc: Nodes d11-30 are ready for job
  1. Run your job:
 [ttrojan@d11-30 ~]$ mpirun find
  1. Open a second terminal window, ssh to your compute node, and run the top command.
[ttrojan@discovery1 ~]$ ssh d11-30
[ttrojan@d11-30 ~]$ top
  1. This will display the processes running on that node:
top - 15:37:36 up 21:50,  1 user,  load average: 0.00, 0.01, 0.05
Tasks: 285 total,   1 running, 284 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65766384 total, 64225800 free,   970788 used,   569796 buff/cache
KiB Swap:  8388604 total,  8388604 free,        0 used. 64535076 avail Mem
15191 ttrojan     20   0    139m   5684   1500 R   2.7  0.0   0:00.04 mpirun
15195 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15196 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15199 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15203 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15204 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15205 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15206 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
15207 ttrojan     20   0   15344    996    768 S   1.3  0.0   0:00.08 find
  1. Enter u and then enter your username to see only your processes.

If you see only one process, then your job is likely running on one core. However, multi-threaded applications may be presented as one process with the %CPU column greater than 100%, indicating that multiple cores are being used.

0.3.7 How do I create a Slurm file?

A Slurm file, or job script, is a text file that contains your cluster resource requests and the commands necessary to run your program. See the Running Jobs user guide for instructions on creating Slurm job scripts.

0.3.8 How do I specify the project account to use for a job?

Enter the command myaccount to see your available accounts.

If you belong to only a single project, there is no need to specify which account to use.

If you are part of multiple projects, then the Slurm job scheduler will consume the compute time allocation of your default project account unless you specify a different one. To avoid confusion, it is best to specify which project account’s allocation to use in your Slurm script:

#SBATCH --account=<project_id>

where <project_id> is your project ID of the form <PI_username>_<id>.

For the sbatch, salloc, or srun commands, you can override the default account by using the --account option at the command line. For example:

salloc –-account=<project_id>

0.3.9 How do I report a problem with a job submission?

If a job submission results in an error and you need help resolving the issue, please submit a help ticket. Make sure to include the job ID, error message, job script, and any additional information you can provide.

0.4 Cluster Resources: Files and Disk Space

0.4.1 How do I create or edit a text file?

Text files are created and edited using text editors that are designed for writing code. These text editors differ from common word processors, like Microsoft Word, in that they only work with plain text files.

You can use the nano, micro, vim, or emacs text editors on CARC systems. We recommend the Micro editor for users new to the command line; Vim and Emacs both have steeper learning curves, but you may eventually find them more useful and productive.

To use one of these editors, load the corresponding software module (e.g., module load micro).

To create a new file, simply enter the editor name as the command (e.g., micro). You can specify the filename when saving the file.

To edit an existing file, enter the editor name as the command and then the path to the file as the argument (e.g., micro script.R).

0.4.2 I accidentally deleted a file. Is it possible to recover it?

We keep two weeks of snapshots for files in your home and project directories. You can think of these snapshots as semi-backups. If you accidentally delete some data, then we will be able to recover it if it was captured by a snapshot in the past two weeks. If data was created and deleted within a one-day period, between snapshots, then we will not be able to recover it. You should always keep extra backups of your important data and other files because of this.

If you need to recover a deleted file, please submit a help ticket and we will determine if a snapshot of the file exists.

0.4.3 Which file system should I store my data in?

CARC has several different file systems, as summarized in the table below:

File system Disk space File recovery (snapshots) Purpose
/home1 100 GB per user Yes Personal files, configuration files, software
/project Default of 5 TB per project (can be increased in 5 TB increments), shared among group members Yes Shared files, data files, software
/scratch1 10 TB per user No Temporary files and high-performance I/O

0.4.4 Can I use the local storage on a compute node?

The /tmp directory on compute nodes (implemented as a RAM-based file system—tmpfs) can be used for small-scale I/O, but it is limited to 1 GB and is often shared with other jobs. For large-scale I/O, the /dev/shm directory on compute nodes (implemented as a RAM-based file system, tmpfs) can be used, but it is limited based on the amount of memory requested for the job. Alternatively, your /scratch1 directory can be used, which is located on a high-performance, parallel file system.

To automatically redirect your temporary files to another location, set the TMPDIR environment variable. For example:

export TMPDIR=/scratch1/<username>

Include this line in job scripts to set the TMPDIR for batch jobs.

0.4.5 How do I share my project data with another user?

The best way to share data with other users is via a shared project directory. By default, your home and scratch directories are set so that only you can read and write to them. This is to protect your personal data from other users. In contrast, your project directories are set, by default, so that all project group members can read and write to them.

You can find more information about managing and sharing your files in our Managing Files Using the Command Line user guide.

0.4.6 How do I share my project data with external collaborators?

The best way to share data with external collaborators is to set up a Globus shared guest collection. You can find more information about this in our Globus user guide.

0.4.7 Do CARC systems support the use or storage of sensitive (e.g., HIPAA-, FERPA-, or CUI-regulated) data?

Currently, CARC systems do not support the use or storage of sensitive data. If your research work includes sensitive data, including but not limited to HIPAA-, FERPA-, or CUI-regulated data, see our Secure Computing Compliance Overview or submit a help ticket before using our systems.

0.4.8 How do I check if I have enough disk space?

Before you submit a large job or install new software, you should check that you have sufficient disk space.

To check your quota, use the myquota command. Under size, compare the results of used and hard. If the value of used is close to the value of hard, you will need to move, compress, or delete files or, for project directories, request an increase in disk space from the CARC user portal.

The chunk files section indicates the way your files and directories are divided up by the parallel file system, not the absolute number of files.

[ttrojan@discovery1 ~]$ myquota

POSIX User  ttrojan       1.03G   100G    37.5K     1.91M


      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
       ttrojan|375879||   13.20 GiB|   10.00 TiB||   162363| 20000000


      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
   ttrojan_120| 32855||   16.92 GiB|    5.00 TiB||     1134| 30000000

0.4.9 I’m running out of space. How do I check the size of my files and directories?

To check the disk usage of files and directories, use the du -h command:

du -h /path/to/file

To list the files or subdirectories in the current directory and sort by size, enter the command cdiskusage. This is a convenience script that uses the du command. Please note that it may take a long time to run for large directories (e.g., the root of a project directory).

0.5 Cluster Resources: Endeavour/Condo Cluster Program

0.5.1 What is the Condo Cluster Program?

CARC’s Condo Cluster Program (CCP) is a service available to USC researchers that require dedicated computing resources for their work. The CCP gives researchers the convenience of having their own dedicated compute nodes, without the responsibility of purchasing and maintaining the nodes themselves. The CCP operates on both an annual subscription-based model, with research groups subscribing to their selected number of compute nodes on a yearly basis, as well as an outright purchase basis, with research groups purchasing compute nodes from CARC for a fixed term (e.g., five years). All hardware is maintained by CARC throughout the course of the subscription or lease term.

For more information on CCP policies, see our main Condo Cluster Program pages.

0.5.2 What is the difference between the subscription model and the traditional purchase model?

Our subscription-based model allows researchers to subscribe to compute nodes on a yearly basis. The subscription model is ideal for researchers who anticipate changes to their resource requirements or for researchers who only require CCP resources for a shorter period of time (minimum of one year). CARC purchases and maintains the resources and all related hardware throughout the course of the subscription term.

Our traditional purchase model is the classic pricing model that we’ve used for condo purchases in previous years. Researchers choose the resources they need and CARC purchases and maintains them for the lease term (e.g., five years). The leasing researchers effectively own their resources, and at the end of their lease period, the resources are retired. CARC maintains the resources and all related hardware throughout the course of the lease term.

For pricing and policies, see the Condo Cluster Program pages.

0.5.3 What is the difference between the Endeavour cluster and the Discovery cluster?

The Discovery cluster is a “public” cluster in the sense that it is open to all CARC users to run their jobs and store their data. The Endeavour cluster comprises the condo resources that CCP users lease or subscribe to, but each research group’s own resources are for their dedicated use only.

0.6 Software

0.6.1 What software is available on CARC systems?

Many traditional utilities can be found in /usr/bin. The majority of software, however, is installed and managed with Spack, a software package manager, and installed somewhere in the /spack directory. Most software managed by CARC staff is accessed using the Lmod module system. See our Software Module System user guide for in-depth information.

In short, we use the Lmod module system to manage your shell environment. To use a certain software, you must “load” its module, which will then dynamically change your environment settings.

To check if a certain software or library is available, use the module spider command. The following example checks for samtools:

[ttrojan@discovery1 ~]$ module spider samtools

  For detailed information about a specific "samtools" package (including how to load the modules)
  use the module's full name. Note that names that have a trailing (E) are extensions provided by
  other modules.
  For example:

     $ module spider samtools/18.0.4

0.6.2 How do I run software on CARC systems?

See our user guides for Software for detailed instructions on how to use specific software, such as blah.

See Advanced HPC Programming for information about programming languages, compilers, and OpenMP/MPI.

Got to Research Applications for a breakdown of all CARC supported applications used for research projects.

If you do not see a page for the software or application you want to use, please submit a help ticket.

0.6.3 Why am I getting a “command not found” error when I try to run a CARC application?

The shell gives this error when it is unable to find the requested application in your search path ($PATH). You will either need to give the full path to the program you want to run or add the program to your PATH environment variable.

For example, if your program is saved in the /project/ttrojan_120/software/bin directory, you can add this parent directory to your PATH:

export PATH=/project/ttrojan_120/software/bin:$PATH

Make sure you include the ending :$PATH so that the old PATH will be included in the new one.

0.6.4 What compilers are available on CARC systems?

CARC makes multiple compilers and multiple versions of each compiler available, including GCC, LLVM, Intel, AOCC, and NVHPC. You can search for available compilers using the module spider command.

Some compilers have an associated software stack built with that compiler which is “unlocked” by loading the appropriate compiler module. This ensures compatibility between applications. GCC is the main compiler used by CARC staff to build software that is made available via the module system.

0.7 CARC Private Cloud Platform

0.7.1 What is the purpose of the CARC private cloud platform?

The CARC private cloud platform complements existing CARC systems and services (Discovery and Endeavour clusters, file systems, etc.) by offering researchers access to virtual machines (VMs) on which to run alternative operating system environments and deploy resources.

0.7.2 What is meant by “private cloud”?

The virtualized platform implemented by CARC forms a private, on-premises cloud system available only to USC researchers.

For more information on different types of cloud services, see our Cloud Computing Overview user guide.

0.7.3 Do I need access to the Discovery or Endeavour clusters to use the CARC private cloud?

No. You can access the CARC private cloud platform using your USC NetID and password.

0.7.4 How do I get access to the CARC private cloud?

You need to submit an allocation request through the user portal before gaining access to Artemis cloud services. Instructions for requesting an allocation can be found in our Request a New Allocation user guide.

0.7.5 What is the cost of using the CARC private cloud?

Currently, the CARC private cloud is offered at no cost to users. CARC staff will notify users will in advance when Artemis services will begin accruing fees.

0.7.6 What virtual machine options are available on the CARC private cloud?

CARC staff are continually evaluating and deploying new virtual machine (VM) templates to the private cloud. The current VM templates configured via KMV are:

  • Ubuntu 20.04
  • Ubuntu 22.04
  • CentOS 7
  • Windows Server 2019
  • Amazon Linux 2

Artemis also offers lightweight micro-VMs, based on Firecracker, for hosting databases (MongoDB and SQL) and websites (NGINX), as well as building singularity images.

0.7.7 Can I access my CARC storage directories on the CARC private cloud?

Linux virtual machines deployed in the private cloud have access to the mounted /home1, /scratch1, and /project file systems. Windows VMs do not have access to mounted file systems—users will need to transfer their data just as they would to and from their local computer. Further details on transferring data while using a virtual machine can be found in our Storage Management on Artemis user guide. Submit a help ticket if you need assistance accessing your CARC storage directories or transferring data.

0.7.8 Should I power off or terminate my virtual machine at the end of the day or when I won’t be using it for a while?

Usually, there’s no need to terminate your VM if you won’t be using it for a day or two, but we recommend powering off your VM at the end of each session.

If you’re not going to use your VM for an extended period of time (e.g., a couple weeks, over a semester break), you should first back up your active datasets and code to your local computer, Git repository, or CARC /project storage and then terminate your VM. When you return, simply create a new VM, upload your work, and continue. Your new VM will be up to date with enhancements and running on the optimal hardware selected automatically by the cloud platform.