Update 202405

From HPC Wiki
Revision as of 15:33, 27 May 2024 by Bgo (talk | contribs)
Jump to:navigation Jump to:search
Warning
This page is a work in progress by bgo (talk | contribs). Treat its contents with caution.

Update to Rocky Linux 9. CentOS discontinued and CentOS 7 reaching end of life (EOL) June 30, 2024.


Roadmap

Topic Completed? Description
Install new compute node hardware - Part I Yes Vendor Installed
Install new compute node hardware - Part II No To be shipped
Install operating system on new compute nodes Yes Using Rocky Linux 9 since CentOS 7 reaching end of life (EOL) on June 30, 2024. Took a lot of time getting the compute nodes with the new Rocky Linux operating system to integrate with the much older CentOS operating system nodes; Finally done.
Update backup and cloning system Yes The backup and cloning software that has been used on the cluster does not work with Rocky Linux, so had to design a package that will work with both Rocky Linux and CentOS.
Update Slurm Yes Updated to a version which works on both Rocky Linux 9 and CentOS 7. Required very short downtime.
Add new compute nodes to Slurm partitions Yes These two new partitions are currently only available to those who purchased the nodes:
  • Kisame
  • suliaoma

When preemption partitions (see below) are added, these will be available to all users.

Update module setup No WIP
Update modules for Rocky Linux 9 No WIP
Install CUDA drivers for new GPU compute nodes Yes
Preemption partitions No TODO
Update CentOS 7 compute nodes to Rocky Linux 9 No TODO
Update login nodes to Rocky Linux 9 No TODO

Slurm Partitions

user $coresavail
Number of nodes in partition with N available cores and RAM.

Nodes  Partition  Available Cores  Available RAM (MiB)
6      kisame     64               515134
3      suliaoma   32               257094
4      himem      32               2063430
62     node       20               128000
4      himem      20               512000
2      gpu        20               128000
17     lomem      16               64000

The himem partition has 4 compute nodes with 32 cores available and just under 2TiB RAM free. There are another 4 himem compute nodes free, but these have 20 cores and 50.5 TiB RAM free.

Modules

Updating to new modules layout to allow for easier upgrades in the future.

New Layout

user $module load newsetup
user $module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

--------------------------- /modules/node/common/MPI ---------------------------
impi/2017.4.196

------------------------ /modules/node/common/Programs -------------------------
bamtools/2.5.1             ffmpeg/4.4
bedtools/2.31.0            gaussian/16B.01-avx2
bowtie/1.2.2               gaussian/16C.01-avx2
bowtie2/2.3.4.3            gaussian/16C.01-LINDA-avx2
bowtie2/2.5.1              nco/4.7.6
cmake/3.15.3               nco/4.9.3
cmake/3.24.2(default)      openssl/1.0.2k
cmake/3.9.1                openssl/3.0.9
fastx-toolkit/0.0.14       salmon/0.12.0
ffmpeg/3.3.3               salmon/1.1.0

------------------------ /modules/node/common/Libraries ------------------------
isl/0.22              mpfr/4.0.1            trimmomatic/0.39
lapack/3.7.1          ncurses/5.9           x264/20171213
libjpeg-turbo/2.1.5.1 newsetup/0.1          zstd/1.5.5
libpng/1.5.30         openssl/1.0.2k
mkl/2017.0.3          openssl/3.0.9

------------------------ /modules/node/common/Languages ------------------------
cuda-toolkit/10.1.243(default) gcc/6.4.0
cuda-toolkit/11.6.2            gcc/7.3.0
cuda-toolkit/8.0.61            gcc/8.3.0
gcc/11.4.0                     gcc/9.2.0
gcc/12.3.0                     intel/2017.4.196
gcc/13.2.0

----------------------------- /modules/node/Types ------------------------------
centos-7-63/0.1   centos-7-79/0.1   rocky-9.3-143/0.1 thisnode/0.1

Common modules that work on both Rocky Linux 9 and CentOS 7.

Rocky Linux Modules

user $module load rocky-9.3-143
user $module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

--------------------------- /modules/node/common/MPI ---------------------------
impi/2017.4.196

------------------------ /modules/node/common/Programs -------------------------
bamtools/2.5.1             ffmpeg/4.4
bedtools/2.31.0            gaussian/16B.01-avx2
bowtie/1.2.2               gaussian/16C.01-avx2
bowtie2/2.3.4.3            gaussian/16C.01-LINDA-avx2
bowtie2/2.5.1              nco/4.7.6
cmake/3.15.3               nco/4.9.3
cmake/3.24.2(default)      openssl/1.0.2k
cmake/3.9.1                openssl/3.0.9
fastx-toolkit/0.0.14       salmon/0.12.0
ffmpeg/3.3.3               salmon/1.1.0

------------------------ /modules/node/common/Libraries ------------------------
isl/0.22              mpfr/4.0.1            trimmomatic/0.39
lapack/3.7.1          ncurses/5.9           x264/20171213
libjpeg-turbo/2.1.5.1 newsetup/0.1          zstd/1.5.5
libpng/1.5.30         openssl/1.0.2k
mkl/2017.0.3          openssl/3.0.9

------------------------ /modules/node/common/Languages ------------------------
cuda-toolkit/10.1.243(default) gcc/6.4.0
cuda-toolkit/11.6.2            gcc/7.3.0
cuda-toolkit/8.0.61            gcc/8.3.0
gcc/11.4.0                     gcc/9.2.0
gcc/12.3.0                     intel/2017.4.196
gcc/13.2.0

----------------------------- /modules/node/Types ------------------------------
centos-7-63/0.1   centos-7-79/0.1   rocky-9.3-143/0.1 thisnode/0.1

----------------------- /modules/node/rocky-9.3-143/MPI ------------------------
openmpi-gcc/4.1.5

--------------------- /modules/node/rocky-9.3-143/Programs ---------------------
lammps/20220623

-------------------- /modules/node/rocky-9.3-143/Libraries ---------------------
hdf5/1.12.1    hdf5/1.8.19    netcdf/4.4.1.1 netcdf/4.8.1