Managing Files Using the Command Line
The following sections describe how to use command-line tools to manage files on CARC systems. To manage files with a graphical user interface, you can use the features available with CARC OnDemand or an SFTP GUI app.
Currently, CARC systems do not support the use or storage of sensitive data. If your research work includes sensitive data, including but not limited to HIPAA-, FERPA-, or CUI-regulated data, see our Secure Computing user guides or contact us at email@example.com before using our systems.
Project files should be organized within a directory structure of some kind in order to keep files organized, documented, and findable. This may include, for example, having separate directories for raw data, processed data, and code.
To list files and directories, use the
ls command. For example, to list files in long format for the current directory use:
For other directories, add the directory path to the command. Enter
man ls or
ls --help for more information and to view all available options.
To create a directory, use the
man mkdir or
mkdir --help for more information and to view all available options.
To copy files or directories, use the
cp /source/path /destination/path
For example, to copy a directory on /scratch1 to /project, use:
cp -r /scratch1/ttrojan/dir /project/ttrojan_123/
-r option, recursive mode, is needed when copying directories. To print a log of the copying, add the
-v option, which enables verbose mode. To copy multiple files or directories to the same destination, simply include additional source paths in the command. Enter
man cp or
cp --help for more information and to view all available options.
Note: Do not use the
cpif you are copying into a CARC project directory, because this will likely result in incorrect group ownership of files that will produce a "disk quota exceeded" error.
To move files or directories (i.e., copy and also remove the files from the source), use the
mv command instead:
mv /source/path /destination/path
To rename files, you can also use the
mv /source/filename.txt /source/newfilename.txt
Note: Do not use
mvif you are moving files from a home or scratch directory into a CARC project directory. This results in incorrect group ownership of files that will produce a "disk quota exceeded" error. Use
cp -rinstead to copy the files and then use
rmto remove the source files.
If you are backing up and syncing a directory, use an
rsync command. For example:
rsync /source/dir/ /destination/dir/
Rsync will copy only files that are new or have changed in the source directory. Enter
man rsync or
rsync --help for more information and to view all available options.
Note: Do not use the
rsyncif you are copying into a CARC project directory. This generally results in incorrect group ownership of files that will produce a "disk quota exceeded" error. Use the options
To delete files or directories, use the
For example, to delete a directory, use:
rm -r /scratch1/ttrojan/dir
-r option, recursive mode, is needed to remove directories. To remove multiple files or directories, simply add additional paths to the command. Enter
man rm or
rm --help for more information and to view all available options.
Checking file disk usage
To check the disk usage of files and directories, use the
du -h command:
du -h /path/to/file
Please note that all file systems run ZFS which compresses files, so the file size on disk may be smaller than the actual file size (on your local computer, for example). Using the
du --apparent-size -h command will give the uncompressed file size. Alternatively, the
ls -lh command should give the same result. Enter
man du or
du --help for more information and to view all available options.
To list the files or subdirectories in the current directory and sort by size, enter the command
cdiskusage. This is a convenience script that uses the
du command. Please note that it may take a long time to run for large directories (e.g., the root of a project directory).
The /project directories are the best place to share files. By default, the members of a project group will have full read, write, and execute permissions for all files in a project directory (i.e., permissions set to 770 = drwxrwx---).
You can check the current permissions for a file or directory with the command
ls -l /path/to/file.
When sharing your files, please keep the following in mind:
- Never set the permissions of your directories to 777 (drwxrwxrwx), which means that any other user on CARC systems can access, modify, and delete your files.
- Do not share or change the permissions of your /home1 directory and its subdirectories. If something goes wrong, you may be blocked from logging in because SSH requires strict permissions for logging in.
- Granting other users read permission for your files (
r--) and read and execute permissions (
r-x) for your directories is typically sufficient for sharing. Granting write permission can result in modified or deleted files, so only provide write permission when actually needed.
You can change file and directory permissions using a
For example, to provide read and execute permissions but not write permission (
r-x) to a project subdirectory for your project group, use:
chmod 750 /project/ttrojan_123/dir
If the subdirectory is actually located within another subdirectory, note that the group would also need read and execute permission to the full hierarchy of subdirectories. Granting write permission to a directory allows users to create, modify, or delete files in that directory, also depending on individual file permissions. Enter
man chmod or
chmod --help for more information and to view all available options.
Backing up files
Although the /home1 and /project file systems have some file recovery capabilities, we encourage you to also back up your files elsewhere. There are a few different backup locations to consider:
- Local storage (e.g., external drive)
- Cloud storage
- Research data repositories
To transfer files to local or cloud storage, see our guide for Transferring Files Using the Command Line. Rsync is useful for syncing to a backup directory on local storage, and Rclone works similarly for cloud storage. For large transfers to local or cloud storage, Globus can sync two directories in a similar manner. However, these tools do not necessarily version control backups by default. Tools with more features designed for backups, such as deduplication and compression, include rdiff-backup, Borg, Kopia, and Restic.
Research data repositories, such as OSF, Zenodo, Harvard Dataverse, and Dryad, are a special type of cloud storage intended for sharing research data with the wider research community. These services typically have an API that can be used at the command line to upload files directly from CARC systems.
For long-term archival storage, also consider using a research data repository. For private archival storage, you can also consult the USC Digital Repository.
Alias commands or backup scripts can help semi-automate backups.
A good plan for backups is the 3-2-1 strategy:
- 3 copies of data
- 2 different media (e.g., devices or file systems)
- 1 copy off-site (e.g., cloud storage)
Also make sure to test that backups are accessible and functional every so often. A good rule of thumb is test every three months.
Archiving and compressing files
Archiving and compressing files can help simplify file organization and save storage space, such as after a project is completed and the associated files are not needed in the immediate future. This is also useful for packaging project files in order to distribute them to other researchers, for example. You can use a combination of the programs
tar for archiving files and
xz for compressing files.
Archiving with tar
To create an archive file from a directory of files, use the
tar command. For example:
tar -cvf <filename>.tar <dir>
To add multiple directories and files, simply add the paths to these directories and files in the command. To check the integrity of the files, add the
To extract the archive, use the
-x option instead of the
-c option. For example:
tar -xvf <filename>.tar
Note that the .tar file will be larger in size than the sum of all the files being archived, primarily because of the added file headers in the archive file. Enter
man tar or
tar --help for more information and to view all available options.
Compressing with gzip
To compress files using
gzip -v <filename>
This will create a .gz file. Including the
-v option, verbose mode, will print the compression ratio. There are 9 levels of compression, with 9 being the highest/slowest level and 6 being the default. The default is typically the best value to use with respect to the compression/time tradeoff. To maximize compression, at the expense of compression time, add the
To uncompress a .gz file, add the
gzip -dv <filename>.gz
gzip --help to view all available options. In addition, the
pigz module is a parallel implementation of
gzip that provides faster compression and uncompression times:
module load pigz. It can be used as a drop-in replacement for
Compressing with xz
For better compression ratios or for maximum compression, use
xz instead of
xz you can also use multiple cores to speed up the compression time. For example, to compress using 4 cores, add the
xz -v -T4 <filename>
This will create a .xz file. Including the
-v option, verbose mode, will print compression progress and related information. There are 9 levels of compression, with 9 being the highest/slowest level and 6 being the default. The default is typically the best value to use with respect to the compression/time tradeoff. To maximize compression, at the expense of compression time and memory required, add the
To uncompress an .xz file, add the
xz -dv -T4 <filename>.xz
man xz or
xz -H for more information and to view all available options.
Archiving and compressing with tar
You can also archive and compress with one command using
tar with the
-z option, which uses
gzip compression by default. For example:
tar -czvf <filename>.tar.gz <dir>
Alternatively, to use
xz to compress, use the
-J option instead. In contrast to using
tar does not delete the source files by default. Add the
--remove-files option to do so.
To uncompress and unarchive in one command, use the
tar -xvf <filename>.tar.gz
This will extract the contents of the archive into the current directory.
tar will automatically detect which uncompression program to use, and note that it will not automatically delete the compressed archive file after extracting the files.
Software for Linux is typically distributed as a .tar.gz file, so a command like the above will extract the source code or binary files into the current directory.
Archiving and compressing before transferring files
Creating and compressing a single archive file can be useful before transferring files to or from CARC systems, especially for directories with a large number of files (e.g., > 1000, regardless of the total size of those files). Each file has associated metadata, which slows down the transfer. Compressing files will reduce the amount of data that needs to be transferred. However, it takes time to compress and uncompress files, so the total transfer time may not necessarily decrease depending on factors like network speeds. With fast network speeds, relative to total transfer size, it is typically not worth compressing files (e.g., when on campus).