Update 202405: Difference between revisions
Line 45: | Line 45: | ||
All partitions will be available to user when preemption (see below) is enabled. | All partitions will be available to user when preemption (see below) is enabled. | ||
|- | |- | ||
| Update module setup || {{No}} || WIP. See [[#Modules| | | Update module setup || {{No}} || WIP. See the [[#New Modules Layout|New Modules Layout]] section for details on how to preview the new setup. | ||
|- | |- | ||
| Update modules for Rocky Linux 9 || {{No}} || WIP | | Update modules for Rocky Linux 9 || {{No}} || WIP |
Revision as of 14:10, 28 May 2024
Recent node purchases and the discontinued CentOS operating system, CentOS 7 will reach end of life (EOL) on June 30, 2024, have prompted the need for major updates on the Magnolia HPC cluster.
The operating system of the Magnolia HPC cluster will be switched from CentOS 7 to Rocky Linux 9. This change will require some custom packages to be recompiled since the libraries of Rocky 9 are much more recent than those of CentOS 7. Custom packages include those listed in Modules, and also from user installed packages included with miniconda.
Roadmap
Many changes are required for the addition of new nodes, along with the change in operating system; a broad list of changes are highlighted here:
Topic | Completed? | Description |
---|---|---|
Install new compute node hardware - Part I | Yes | Vendor Installed
|
Install new compute node hardware - Part II | No | To be shipped |
Install operating system on new compute nodes | Yes | Using Rocky Linux 9 since CentOS 7 reaching end of life (EOL) on June 30, 2024. Took a lot of time getting the compute nodes with the new Rocky Linux operating system to integrate with the much older CentOS operating system nodes; Finally done. |
Update backup and cloning system | Yes | The backup and cloning software that has been used on the cluster does not work with Rocky Linux, so had to design a package that will work with both Rocky Linux and CentOS. |
Update Slurm | Yes | Updated to a version which works on both Rocky Linux 9 and CentOS 7. Required very short downtime. |
Add new compute nodes to Slurm partitions | Yes | The first two partitions are currently only available to those who purchased the nodes:
All partitions will be available to user when preemption (see below) is enabled. |
Update module setup | No | WIP. See the New Modules Layout section for details on how to preview the new setup. |
Update modules for Rocky Linux 9 | No | WIP |
Install CUDA drivers for new GPU compute nodes | Yes | |
Preemption partitions | No | TODO |
Update CentOS 7 compute nodes to Rocky Linux 9 | No | TODO |
Update login nodes to Rocky Linux 9 | No | TODO |
Efforts are being made to make these changes as transparent to the general user as possible, however there will come a point where this will no longer be possible. Plenty of notice will be give when it comes time for the general user to make changes in how they utilize the Magnolia HPC cluster.
Contacts
Upgrades may inadvertently cause some software packages to fail and miss checks by administraton; if you suspect such an event, please contact Brian Olson with details.
Slurm Partitions
As mentioned in the roadmap, two new partitions have been created. The standard slurm commands can be used to show the current usage, or an overview of cores and RAM available on each partition can be shown with the coresavail command:
user $
coresavail
Number of nodes in partition with N available cores and RAM. Nodes Partition Available Cores Available RAM (MiB) 6 kisame 64 515134 3 suliaoma 32 257094 4 himem 32 2063430 24 node 20 128000 4 himem 20 512000 2 gpu 20 128000 17 lomem 16 64000
This example output shows that the himem partition has 4 compute nodes each with 32 cores available and just under 2TiB RAM free. There are another 4 himem compute nodes free, but these each have 20 cores and 0.5 TiB RAM free.
The kisame and suliaoma partitions are currently only accessible to those who purchased the hardwaare (see roadmap for details).
Modules
Updating to new modules layout to allow for easier upgrades in the future.
New Modules Layout
user $
module load newsetup
user $
module avail
------------------------ /usr/share/Modules/modulefiles ------------------------ dot module-git module-info modules null use.own --------------------------- /modules/node/common/MPI --------------------------- impi/2017.4.196 ------------------------ /modules/node/common/Programs ------------------------- bamtools/2.5.1 ffmpeg/4.4 bedtools/2.31.0 gaussian/16B.01-avx2 bowtie/1.2.2 gaussian/16C.01-avx2 bowtie2/2.3.4.3 gaussian/16C.01-LINDA-avx2 bowtie2/2.5.1 nco/4.7.6 cmake/3.15.3 nco/4.9.3 cmake/3.24.2(default) openssl/1.0.2k cmake/3.9.1 openssl/3.0.9 fastx-toolkit/0.0.14 salmon/0.12.0 ffmpeg/3.3.3 salmon/1.1.0 ------------------------ /modules/node/common/Libraries ------------------------ isl/0.22 mpfr/4.0.1 trimmomatic/0.39 lapack/3.7.1 ncurses/5.9 x264/20171213 libjpeg-turbo/2.1.5.1 newsetup/0.1 zstd/1.5.5 libpng/1.5.30 openssl/1.0.2k mkl/2017.0.3 openssl/3.0.9 ------------------------ /modules/node/common/Languages ------------------------ cuda-toolkit/10.1.243(default) gcc/6.4.0 cuda-toolkit/11.6.2 gcc/7.3.0 cuda-toolkit/8.0.61 gcc/8.3.0 gcc/11.4.0 gcc/9.2.0 gcc/12.3.0 intel/2017.4.196 gcc/13.2.0 ----------------------------- /modules/node/Types ------------------------------ centos-7-63/0.1 centos-7-79/0.1 rocky-9.3-143/0.1 thisnode/0.1
Common modules that work on both Rocky Linux 9 and CentOS 7.
Rocky Linux Modules
user $
module load rocky-9.3-143
user $
module avail
------------------------ /usr/share/Modules/modulefiles ------------------------ dot module-git module-info modules null use.own --------------------------- /modules/node/common/MPI --------------------------- impi/2017.4.196 ------------------------ /modules/node/common/Programs ------------------------- bamtools/2.5.1 ffmpeg/4.4 bedtools/2.31.0 gaussian/16B.01-avx2 bowtie/1.2.2 gaussian/16C.01-avx2 bowtie2/2.3.4.3 gaussian/16C.01-LINDA-avx2 bowtie2/2.5.1 nco/4.7.6 cmake/3.15.3 nco/4.9.3 cmake/3.24.2(default) openssl/1.0.2k cmake/3.9.1 openssl/3.0.9 fastx-toolkit/0.0.14 salmon/0.12.0 ffmpeg/3.3.3 salmon/1.1.0 ------------------------ /modules/node/common/Libraries ------------------------ isl/0.22 mpfr/4.0.1 trimmomatic/0.39 lapack/3.7.1 ncurses/5.9 x264/20171213 libjpeg-turbo/2.1.5.1 newsetup/0.1 zstd/1.5.5 libpng/1.5.30 openssl/1.0.2k mkl/2017.0.3 openssl/3.0.9 ------------------------ /modules/node/common/Languages ------------------------ cuda-toolkit/10.1.243(default) gcc/6.4.0 cuda-toolkit/11.6.2 gcc/7.3.0 cuda-toolkit/8.0.61 gcc/8.3.0 gcc/11.4.0 gcc/9.2.0 gcc/12.3.0 intel/2017.4.196 gcc/13.2.0 ----------------------------- /modules/node/Types ------------------------------ centos-7-63/0.1 centos-7-79/0.1 rocky-9.3-143/0.1 thisnode/0.1 ----------------------- /modules/node/rocky-9.3-143/MPI ------------------------ openmpi-gcc/4.1.5 --------------------- /modules/node/rocky-9.3-143/Programs --------------------- lammps/20220623 -------------------- /modules/node/rocky-9.3-143/Libraries --------------------- hdf5/1.12.1 hdf5/1.8.19 netcdf/4.4.1.1 netcdf/4.8.1
Most Used Modules
For installing modules, I am prioritizing those used 10 or more times over the last 90 days. If you require a module not on this list, please make a request to those listed in the contacts
All nodes over last 90 days: Module Count hdf5 2089 netcdf 2079 intel 1381 mkl 1317 lammps 1294 mpfr 1225 gcc 1222 libpng 1109 isl 1057 x264 804 ffmpeg 792 esmf 749 MCT 744 scrip-coawst 742 fftw 737 nco 683 ncview 676 gaussian 645 python 433 cuda-toolkit 317 bedtools 258 samtools 258 openmpi 221 bowtie 203 R 165 orca 150 fastx-toolkit 146 digimat 109 icu 96 readline 93 ncurses 88 matlab 69 impi 56 cmake 54 openmpi-gcc 52 lapack 45 salmon 34 git 27 trimmomatic 20 newsetup 18 gmake 14 bowtie2 13 lammps-tools 13 mopac 13 gromacs 10 STAR 10