/project File System Issue
0.0.0.1 Technical information on the recent /project file system issue.
0.0.0.2 Current status
The data recovery company has completed their analysis of the failed /project hard drives. We will be reaching out to PIs individually to discuss the next steps.
0.0.0.3 Updates
Update (6/9/25): We have identified complete lists of affected files with false positives filtered out. There is a text file in each directory containing a list of affected files with the naming format of INCOMPLETE_20250606__project_ttrojan_120.txt
. Only users assigned to that project can view the list. The previous list (that contained false positives) has not been removed from the project directories, they’ve just been hidden.
Update (5/15/25): Please be aware that snapshots were also affected by the /project disk failure. At this time, snapshots are not a viable method of data recovery for users whose data was affected.
Additionally, copying data that was stored on the affected disks to either /scratch1 or /project2 will not solve the issue. We are seeing runtime issues on /scratch1 from attempts to copy over lost or corrupted data from the failed /project disks. We know this is very stressful for many of you, but we ask that you please be patient while the data recovery company works to resolve this issue.
0.0.0.4 Timeline
April 27, 2025 @ 10:31 AM PDT: A routine drive failure occurred in one of the storage pools.

The way /project is configured is meant to safeguard against disk failures. Each file on /project is broken up into chunks and stored on 4 of 80 total storage pools. Each storage pool is made up of an array of 12 hard drives that can tolerate the loss of 2 hard drives while still maintaining the data in the pool. Hard drive failures are not uncommon and we routinely replace failed drives. Once the data from a failed drive is copied over to the new drive, the pool is fully functioning again and back to full fault tolerance.
April 28, 2025 @ 8:05 AM PDT: We started a routine rebuild operation for the failed drive.
April 30, 2025 @ 9:39 AM PDT: During the single failed drive’s replacement, two more drives in the same pool failed, resulting in 3 total failed drives simultaneously occurring during the rebuild. This is a very rare occurrence.

At this point, many users started receiving remote I/O errors.
May 7, 2025: After a series of troubleshooting measures and an emergency maintenance, the failed drives were removed from the file system and sent to a data recovery company.

At this point, many users started receiving incomplete file errors.
0.0.0.5 Check your data status
The previous methods for identifying affected files contained false positives. Not every file that was assigned to storage target 55 was affected.
The most accurate way to determine if your files have been affected is to check the lists of files that have been generated for each /project allocation. To find the list, navigate to your project directories. The path will look something like /project/ttrojan_120/INCOMPLETE_20250606__project_ttrojan_120.txt
. Lists are specific to the project directory they are in and can only be accessed by users assigned to that project.
If you discover that your data has been affected by the /project disk failure, please email us at carc-support@usc.edu or submit a help ticket.
0.0.0.6 Cold Storage System and data migration
While we do offer snapshots for /project and /project2, they are not the most reliable way to back up your data. CARC offers the Cold Storage System as a solution for long-term data archiving for large data sets. PIs can request a cold storage allocation via the CARC user portal.
For more information on pricing and how to use this system, see our CARC Cold Storage System guide.
Additionally, we are planning to migrate all projects to the new /project2 file system. The /project2 storage system is built entirely from solid state drives (SSDs), offering significantly better performance, speed, and dependability. We are aiming to start this process in late July or early August, 2025. We will keep you updated on this as it gets closer.
For more information and pricing, please see our Project file system guide.