Storage File Systems

All CARC users are assigned four directories on four file systems where they can store files and run programs:

  • /home1
  • /project
  • /scratch1
  • /scratch2

These are global file systems in that you can access them from any Discovery, Endeavour, or transfer node. You can list the directories available to you and storage usage for each by entering the command myquota.

The following table provides an overview of the file systems:

File systemDisk spaceFile recovery (snapshots)Purpose
/home1100 GB per userYesPersonal files, configuration files, software
/projectDefault of 5 TB per project (can be increased in 5 TB increments), shared among group membersYesShared files, data files, software
/scratch110 TB per userNoTemporary files and high-perfomance I/O
/scratch210 TB per userNoTemporary files

Sensitive data

Currently, CARC systems do not support the use or storage of sensitive data. If your research work includes sensitive data, including but not limited to HIPAA-, FERPA-, or CUI-regulated data, see our Secure Computing user guide or contact us at carc-support@usc.edu before using our systems.

Home file system (/home1)

The /home1 file system has a total capacity of 136 TB, running NFS/ZFS on dedicated storage machines. It consists of personal directories for CARC users. Your home directory has a quota of 100 GB of disk space and 1.91 million files. It is intended for storing personal files, configuration files, and software. I/O-intensive jobs should not be run directly from your home directory.

When you log in, you will always start in your home directory, which is located at:

/home1/<username>

Use the cd command to quickly change to your home directory from another directory.

We keep two weeks of snapshots for files in your home directory. You can think of these snapshots as semi-backups. If you accidentally delete some data, then we will be able to recover it if it was captured by a snapshot in the past two weeks. If data was created and deleted within a one-day period, between snapshots, then we will not be able to recover it. You should always keep extra backups of your important data and other files because of this.

If you need to recover a deleted file, please contact the CARC team by submitting a ticket and we will determine if a snapshot of the file exists.

Project file system (/project)

The /project file system has a total capacity of 8.4 PB and consists of directories for different research project groups. It offers high-performance, parallel I/O, running ZFS/BeeGFS on dedicated storage machines. The default quota for each project directory is 5 TB of disk space and 30 million files.

A project's PI must request a project storage allocation via the CARC User Portal. Each PI can request up to 10 TB of storage across their project(s) at no cost. If more than 10 TB is needed, a PI can request additional storage space in 5 TB increments at a cost of $40/TB/year. For more information on storage quotas and pricing, see the Accounts and Allocations page.

Each project member has access to their group's project directory, where they can store data, scripts, and related files and install software. The project directory should be used for most of your CARC work, and it's also where you can collaborate with your research project group. Users affiliated with multiple CARC projects will have access to multiple project directories so they can easily share their files with the appropriate groups.

Project directories are located at:

/project/<PI_username>_<id>

where <PI_username> is the username of the project owner and <id> is a 2 or 3 digit project ID number (e.g., ttrojan_123).

You can list your project directories and storage usage by entering the command myquota. You can also find the project ID and directory path on the project page in the User Portal.

Tip: You can create an alias command to quickly change to your project directory. For example, for the user ttrojan, adding the line alias cdp="cd /project/ttrojan_123" to their ~/.bashrc file will create the alias command cdp every time they log in, which can be used as a shortcut for quickly switching to their project directory.

To create your own subdirectory within your project's directory, enter a command like the following:

mkdir /project/<PI_username>_<id>/<username>

If needed, you can change the permissions of this subdirectory using a chmod command.

We keep two weeks of snapshots for files in your project directories. You can think of these snapshots as semi-backups. If you accidentally delete some data, then we will be able to recover it if it was captured by a snapshot in the past two weeks. If data was created and deleted within a one-day period, between snapshots, then we will not be able to recover it. You should always keep extra backups of your important data and other files because of this.

If you need to recover a deleted file, please contact the CARC team by submitting a ticket and we will determine if a snapshot of the file exists.

Scratch file systems (/scratch1 and /scratch2)

The /scratch1 and /scratch2 file systems offer high-performance, parallel I/O, running ZFS/BeeGFS on dedicated storage machines. /scratch1 has a total capacity of 1.6 PB, and /scratch2 has a total capacity of 709 TB. Each CARC user gets a personal directory in /scratch1 and /scratch2. The quota for each scratch directory is 10 TB of disk space and 20 million files. /scratch1 should be preferred because it is newer and has better read/write speeds than /scratch2.

The scratch file systems are intended for temporary and intermediate files, so they are not backed up. Files on these file systems may be purged every so often (with advanced warning). If needed, files stored here should be periodically backed up to decrease the risk of data loss.

Your /scratch1 directory is located at:

/scratch1/<username>

Use the cds command to quickly change to your /scratch1 directory from another directory.

Your /scratch2 directory is located at:

/scratch2/<username>

Use the cds2 command to quickly change to your /scratch2 directory from another directory.

/tmp space

For temporary files, each compute node has a local /tmp directory, implemented as a RAM-based file system (tmpfs). However, they are restricted to 1 GB of space that is shared among jobs running on the same node. If more space is needed, you could instead use the local /dev/shm directory on each compute node for temporary files, also implemented as a RAM-based file system (tmpfs), but it is limited based on the amount of memory you request for your job. You can also use your scratch directories for temporary files, but read/write speeds may be slower because files are saved to disk.

In your scripts and programs, you can explicitly define temporary directories. Most applications will also save temporary files to the value of the TMPDIR environment variable, which by default is set to a unique /tmp directory for jobs. To automatically redirect your temporary files to another location, set the TMPDIR environment variable. For example:

export TMPDIR=/scratch1/<username>

Include this line in job scripts to set the TMPDIR for batch jobs.

Limits on disk space and number of files

CARC clusters are shared resources. As a result, there are quotas on usage to help ensure fair access to all USC researchers as well as to maintain the performance of the file systems. There are quotas on both the amount of disk space used and the number of files stored.

To check your quota, enter the myquota command. Under size, compare the results of used and quota or hard. If the value of used is close to the value of the other, you will need to delete, compress, consolidate, and/or archive files.

For project directories, PIs can also request an increase in disk space from the User Portal. For more information on storage quotas and pricing, see the Accounts and Allocations page.

For scratch directories, quotas can be temporarily increased by request. Please submit a request to the CARC team by submitting a ticket.

Please note that the quota for your home directory is fixed and unchangeable.

Note: The chunk files section indicates the way your files and directories are divided up by the parallel file system, not necessarily the absolute number of files. Nonetheless, if you exceed the limit, you will need to reduce the number of files or request more space.

[ttrojan@discovery1 ~]$ myquota
/home1/ttrojan

TYPE        NAME           USED  QUOTA  OBJUSED  OBJQUOTA
POSIX User  ttrojan       4.39G   100G    39.9K     1.91M


/scratch1/ttrojan

      user/group     ||           size          ||    chunk files    
     name     |  id  ||    used    |    hard    ||  used   |  hard   
--------------|------||------------|------------||---------|---------
       ttrojan|555555||  446.78 MiB|   10.00 TiB||     5797| 20000000


/scratch2/ttrojan

      user/group     ||           size          ||    chunk files    
     name     |  id  ||    used    |    hard    ||  used   |  hard   
--------------|------||------------|------------||---------|---------
       ttrojan|555555||  200.34 MiB|   10.00 TiB||     4002| 20000000


/project/ttrojan_120

      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
--------------|------||------------|------------||---------|---------
   ttrojan_120| 32853||   16.92 GiB|    5.00 TiB||     1134| 30000000

If you exceed the limits, you will receive a "disk quota exceeded" or similar error.

How to fix "disk quota exceeded" error

There are two main reasons you may get a "disk quota exceeded" error on CARC systems.

Hard quota limit

First, enter the command myquota and check your storage usage. You may have simply hit a hard quota limit, either total storage space or total number of files. In this case, try to delete, compress, consolidate, and/or archive files to free up space. Alternatively, for /project directories, you can request more space via a support ticket.

Incorrect group ownership

Second, for /project directories specifically, if your storage usage is not actually near the quota limits, then the likely cause is that the group ownership of some of your project files does not match the project group ID. For example, files in the project directory /project/ttrojan_123 should have group ownership by ttrojan_123:

[ttrojan@discovery1 ~]$ ls -ld /project/ttrojan_123/ttrojan/file.txt
-rw-rw---- 1 ttrojan ttrojan_123 293 Dec 10 15:10 /project/ttrojan_123/ttrojan/file.txt

In this example, ttrojan is the user owner ID and ttrojan_123 is the group owner ID.

The group ID is used to enforce the storage quota limits for project directories. By default, new files and directories should have the correct group ID, but it is possible to override this. Typically, for this error, the group ID for some of your files is your personal group (same as your username) (e.g., ttrojan), which has a small quota thus producing the "disk quota exceeded" error if new files are written with the personal group ID.

To check if you have files with the incorrect group ID, enter a command like the following substituting your username:

beegfs-ctl --getquota --mount=/project --gid ttrojan

To find files with the incorrect group ID, enter a command like the following substituting your project directory path and project group ID:

find /project/ttrojan_123/ttrojan \! -group ttrojan_123

The likely reason for files having the wrong group ID is using a mv, cp -a, scp -r, rsync -p, or rsync -a command that preserves file permissions from source files when moving or copying them into the project directory. Alternatively, some subdirectories within your project directory may not have the correct setgid bit that determines the default group ID for new files and directories.

To fix this issue, enter a sequence of commands like the following, substituting your project directory path and project ID:

chgrp -R ttrojan_123 /project/ttrojan_123/ttrojan
find /project/ttrojan_123/ttrojan -type d -exec chmod g+s {} \;

These commands will recursively change and set the default group ownership of files and subdirectories to match the project group. You will get an “operation not permitted” message for files you do not own, but this can be ignored. These commands will only change the files that you own.

It is best to run these commands only for specific subdirectories where you know you have files, because they may take awhile to run, especially for large directories.

You may also need to submit these commands for each project directory you have access to.

Please note that it may take ~15 minutes for the quota to update, after which you should be able to save new files again.

The best method for moving or copying files into a project directory is rsync -rlt. If needed, you can delete the source files after a successful copy or add the --remove-source-files option.

Back to top