Process Limits on Login Nodes
CARC has implemented a limit on processes performed on login nodes
By Dani Cannella
Login nodes serve as the main user interface for the CARC clusters and are shared among all the users across the university. These nodes are only intended for basic tasks, such as managing files, editing scripts, and managing jobs. It only takes a few users performing computationally intensive tasks on the login nodes to result in slow performance for all users across the CARC clusters.
To ensure smooth operation for everyone, the CARC team has implemented a limit on the total number of processes an individual user can spawn on the login nodes in an effort to prevent these shared resources from becoming saturated and sluggish.
We have implemnted a 64-process, 4 CPU core, and 32 GB memory limit on each login node.
If a user exceeds these limits, they may not be able to access the cluster or their application may be aborted. Here are a few examples of process utilization:
- Each connection to the login node through the terminal spawns two processes, as well as one process for the ssh-agent per user.
Please note that accessing the cluster via the Remote SSH extension of VS Code may be blocked, since this extension spawns too many processes on the login nodes, thereby exceeding the limit. Additionally, the processes started by Remote SSH are not properly killed after the user logs out of the application. This may cause an account lockout, preventing the user from accessing the cluster, even from the terminal. It is recommended to use the SSH-FS extension in VS Code instead.
- When a user launches Python and imports a package that relies on OpenBLAS (e.g., Numpy), this library will auto-detect the number of CPUs available on a node and create a thread pool based on that number. This exceeds the process limit imposed on the login nodes and causes Python to crash. If it is absolutely necessary to run this kind of script on the login node, limit the number of threads by setting these environmental variables:
$ export MKL_NUM_THREADS=1
$ export NUMEXPR_NUM_THREADS=1
$ export OMP_NUM_THREADS=1
The best approach to installing a package or debugging code on the cluster is to request an interactive node from the debug partition—or any other compute node—to complete your tasks. To do this, use the following commmand:
$ salloc -p debug --time=01:00:00 --ntasks=8
We appreciate your understanding during the implementation of these limits. If you need assitance, please submit a help ticket.