Stata

Last updated March 31, 2025

Table of Contents

1 Using Stata on CARC systems
- 1.1 Stata GUI
2 Running Stata in interactive mode
3 Running Stata in batch mode
4 Installing Stata packages
5 Storing temporary files
6 Parallel programming with Stata
7 Additional resources

Stata is a proprietary software package for statistics and data science.

1 Using Stata on CARC systems

Begin by logging in. You can find instructions for this in the Getting Started with Discovery or Getting Started with Endeavour user guides.

You can use Stata in either interactive or batch modes. For either mode, first load the corresponding software module:

module load stata/18

This module loads Stata 18. If needed, you can use previous versions by entering the command version <#> within Stata (e.g., version 13). To check the version currently being used, enter version within Stata. For reproducibility purposes, it is also a good practice to include a version statement like this in your Stata scripts (do-files), based on the version used to develop the script.

Use the stata-mp executable when using Stata on CARC systems. The MP version of Stata enables use of large datasets as well as multiple cores for parallel computation. The current Stata license allows up to 8 cores.

1.1 Stata GUI

The Stata GUI is available to use on compute nodes via our OnDemand service.

2 Running Stata in interactive mode

Using Stata on a login node should be reserved for installing packages. A common mistake for new users of HPC clusters is to run heavy workloads directly on a login node (e.g., discovery.usc.edu or endeavour.usc.edu). Unless you are only running a small test, please make sure to run your program as a job interactively on a compute node. Processes left running on login nodes may be terminated without warning. For more information on jobs, see our Running Jobs user guide.

To run Stata interactively on a compute node:

Reserve job resources on a node using salloc
Once resources are allocated, load the required modules and enter stata-mp

[user@discovery1 ~]$ salloc --time=1:00:00 --cpus-per-task=8 --mem=16G --account=<project_id>
salloc: Pending job allocation 24316
salloc: job 24316 queued and waiting for resources
salloc: job 24316 has been allocated resources
salloc: Granted job allocation 24316
salloc: Waiting for resource configuration
salloc: Nodes d05-08 are ready for job

Change the resource requests (the --time=1:00:00 --cpus-per-task=8 --mem=16G --account=<project_id> part after your salloc command) as needed to reflect the number of cores and memory required. Also substitute your project ID; enter myaccount to view your available project IDs.

Once you are granted the resources and logged in to a compute node, load the required modules and then enter stata-mp:

[user@d05-08 ~]$ module load stata/18
[user@d05-08 ~]$ stata-mp

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      StataNow 18.5
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: 20-user 8-core network, expiring 30 Jun 2025
Serial number: 501809303106
  Licensed to: Center for Advanced Research Computing
               University of Southern California

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000 but can be increased; see help set_maxvar.

.

The shell prompt changes from user@discovery1 to user@<nodename> to indicate that you are now on a compute node (e.g., d05-08).

To run Stata scripts (do-files) from within Stata, use the do command (e.g., do script.do). Alternatively, to run Stata scripts from the shell, use the stata-mp -b do <script> command (e.g., stata-mp -b do script.do).

To exit the node and relinquish the job resources, enter exit to exit Stata and then enter exit again in the shell. This will return you to the login node:

. exit
[user@d05-08 ~]$ exit
exit
salloc: Relinquishing job allocation 24316
[user@discovery1 ~]$

3 Running Stata in batch mode

To submit jobs to the Slurm job scheduler, use Stata in batch mode:

Create a Stata script (do-file)
Create a Slurm job script that runs the Stata script
Submit the job script to the job scheduler using sbatch

Your Stata script should consist of the sequence of Stata commands needed for your analysis.

A Slurm job script is a special type of Bash shell script that the Slurm job scheduler recognizes as a job. For a job running Stata, a Slurm job script should look something like the following:

#!/bin/bash

#SBATCH --account=<project_id>
#SBATCH --partition=main
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=1:00:00

module purge
module load stata/18

stata-mp -b do script.do

Each line is described below:

Command or Slurm argument	Meaning
`#!/bin/bash`	Use Bash to execute this script
`#SBATCH`	Syntax that allows Slurm to read your requests (ignored by Bash)
`--account=<project_id>`	Charge compute resources used to <project_id>; enter `myaccount` to view your available project IDs
`--partition=main`	Submit job to the main partition
`--nodes=1`	Use 1 compute node
`--ntasks=1`	Run 1 task (e.g., running a Stata script)
`--cpus-per-task=8`	Reserve 8 CPUs for your exclusive use
`--mem=16G`	Reserve 16 GB of memory for your exclusive use
`--time=1:00:00`	Reserve resources described for 1 hour
`module purge`	Clear environment modules
`module load stata/18`	Load the `stata` environment module
`stata-mp -b do script.do`	Use `stata-mp` to run `script.do`

Adjust the resources requested based on your needs, but remember that fewer resources requested leads to less queue time for your job. The current Stata license limits you to a maximum of 8 CPUs.

Develop and edit Stata scripts and job scripts to run on CARC clusters:

on your local computer and then transfer the files to one of your directories on CARC file systems.
with the Files app available on our OnDemand service.
with one of the available text editor modules (nano, micro, vim, or emacs).

Save the job script as stata.job, for example, and then submit it to the job scheduler with Slurm’s sbatch command:

[user@discovery1 ~]$ sbatch stata.job
Submitted batch job 170554

To check the status of your job, enter myqueue. If there is no job status listed, then this means the job has completed.

The output of your Stata script will be saved to a log file, not the Slurm output file. In batch mode, Stata will automatically create a plain-text log file in the current working directory (e.g., script.log). As a result, you do not need to include log commands in your scripts. To view the contents of the log file, enter less <script>.log, and then enter q to exit the viewer.

For more information on running and monitoring jobs, see the Running Jobs guide.

4 Installing Stata packages

User-developed Stata packages can be installed from a login node using one of the Stata commands net install <package> or ssc install <package>, depending on the source of the package. These packages will be installed in your home directory by default.

5 Storing temporary files

Loading the Stata module will automatically change the STATATMP directory to a /scratch1/<username>/stata directory, used for storing temporary files. To use a different directory, set the STATATMP environment variable in your job script after loading the module:

export STATATMP=<dir>

where <dir> is the directory of your choice. You will get the best performance by using a directory in /dev/shm, which is local to a compute node but limited in size based on the job’s memory request, or alternatively, in your /scratch1 directory.

6 Parallel programming with Stata

If using the stata-mp executable, Stata will automatically use the requested number of cores from Slurm’s --cpus-per-task option. This implicit parallelism does not require any changes to your code. The current Stata license allows up to 8 cores. For more information about stata-mp, see Stata’s performance report.

There are also user-developed packages for Stata that provide additional capabilities. For example, the parallel package implements parallel for loops: https://github.com/gvegayon/parallel. In addition, the gtools package provides faster alternatives to some Stata commands when working with big data: https://github.com/mcaceresb/stata-gtools

On Linux, like CARC systems, it is also a good practice to set maximum memory use in your Stata scripts. For example:

set max_memory 16g

The value should be equal to or less than the total memory requested with Slurm’s --mem or --mem-per-cpu option.

7 Additional resources

If you have questions about or need help with Stata, please submit a help ticket and we will assist you.