Update 202405: Difference between revisions

From HPC Wiki
Jump to:navigation Jump to:search
No edit summary
No edit summary
Line 7: Line 7:
The operating system of the Magnolia HPC cluster will be switched from CentOS 7 to [//www.rockylinux.org Rocky Linux 9]. This change will require some custom packages to be recompiled since the libraries of Rocky 9 are much more recent than those of CentOS 7. Custom packages include those listed in [[#Modules|Modules]], and also from user installed packages included with [//docs.anaconda.com/free/miniconda/index.html miniconda].
The operating system of the Magnolia HPC cluster will be switched from CentOS 7 to [//www.rockylinux.org Rocky Linux 9]. This change will require some custom packages to be recompiled since the libraries of Rocky 9 are much more recent than those of CentOS 7. Custom packages include those listed in [[#Modules|Modules]], and also from user installed packages included with [//docs.anaconda.com/free/miniconda/index.html miniconda].


== Roadmap ==


== Roadmap ==
Many changes are required for the addition of new nodes, along with the change in operating system; a broad list of changes are highlighted here:
   
   
{| class="table table-striped"
{| class="table table-striped"
Line 56: Line 57:
| Update login nodes to Rocky Linux 9 || {{No}} || TODO
| Update login nodes to Rocky Linux 9 || {{No}} || TODO
|}
|}
Efforts are being made to make these changes as transparent to the general user as possible, however there will come a point where this will no longer be possible. Plenty of notice will be give when it comes time for the general user to make changes in how they utilize the Magnolia HPC cluster.
== Contacts ==
Upgrades may inadvertently cause some software packages to fail and miss checks by administraton; if you suspect such an event, please contact [mailto:Brian.Olson@usm.edu?Subject=Magnolia%20HPC%20Cluster Brian Olson] with details.


== Slurm Partitions ==
== Slurm Partitions ==
Line 171: Line 178:
=== Most Used Modules ===
=== Most Used Modules ===


For installing modules, I am prioritizing those used 10 or more times over the last 90 days. If you require a module not on this list, please contact me.
For installing modules, I am prioritizing those used 10 or more times over the last 90 days. If you require a module not on this list, please make a request to those listed in the [[#Contacts|contacts]]


{{GenericCmd|output=<pre>
{{GenericCmd|output=<pre>

Revision as of 13:54, 28 May 2024

Warning
This page is a work in progress by bgo (talk | contribs). Treat its contents with caution.

Recent node purchases and the discontinued CentOS operating system, CentOS 7 will reach end of life (EOL) on June 30, 2024, have prompted the need for major updates on the Magnolia HPC cluster.

The operating system of the Magnolia HPC cluster will be switched from CentOS 7 to Rocky Linux 9. This change will require some custom packages to be recompiled since the libraries of Rocky 9 are much more recent than those of CentOS 7. Custom packages include those listed in Modules, and also from user installed packages included with miniconda.

Roadmap

Many changes are required for the addition of new nodes, along with the change in operating system; a broad list of changes are highlighted here:

Topic Completed? Description
Install new compute node hardware - Part I Yes Vendor Installed
  • 6 Compute nodes, each with:
    • 2 Intel(R) Xeon(R) Gold 6448H Processors
    • 512 MiB RAM
  • 4 High memory compute nodes, each with:
    • 2 Intel(R) Xeon(R) Gold 6426Y Processors
    • 2 TiB RAM
  • 3 GPU compute nodes, each with:
    • 2 Intel(R) Xeon(R) Gold 6426Y Processors
    • 256 GiB RAM
    • 2 NVIDIA A100 80GB PCIe GPUs
Install new compute node hardware - Part II No To be shipped
Install operating system on new compute nodes Yes Using Rocky Linux 9 since CentOS 7 reaching end of life (EOL) on June 30, 2024. Took a lot of time getting the compute nodes with the new Rocky Linux operating system to integrate with the much older CentOS operating system nodes; Finally done.
Update backup and cloning system Yes The backup and cloning software that has been used on the cluster does not work with Rocky Linux, so had to design a package that will work with both Rocky Linux and CentOS.
Update Slurm Yes Updated to a version which works on both Rocky Linux 9 and CentOS 7. Required very short downtime.
Add new compute nodes to Slurm partitions Yes The first two partitions are currently only available to those who purchased the nodes:
  • kisame
  • suliaoma
  • himem

All partitions will be available to user when preemption (see below) is enabled.

Update module setup No WIP
Update modules for Rocky Linux 9 No WIP
Install CUDA drivers for new GPU compute nodes Yes
Preemption partitions No TODO
Update CentOS 7 compute nodes to Rocky Linux 9 No TODO
Update login nodes to Rocky Linux 9 No TODO

Efforts are being made to make these changes as transparent to the general user as possible, however there will come a point where this will no longer be possible. Plenty of notice will be give when it comes time for the general user to make changes in how they utilize the Magnolia HPC cluster.

Contacts

Upgrades may inadvertently cause some software packages to fail and miss checks by administraton; if you suspect such an event, please contact Brian Olson with details.

Slurm Partitions

user $coresavail
Number of nodes in partition with N available cores and RAM.

Nodes  Partition  Available Cores  Available RAM (MiB)
6      kisame     64               515134
3      suliaoma   32               257094
4      himem      32               2063430
62     node       20               128000
4      himem      20               512000
2      gpu        20               128000
17     lomem      16               64000

The himem partition has 4 compute nodes with 32 cores available and just under 2TiB RAM free. There are another 4 himem compute nodes free, but these have 20 cores and 50.5 TiB RAM free.

Modules

Updating to new modules layout to allow for easier upgrades in the future.

New Layout

user $module load newsetup
user $module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

--------------------------- /modules/node/common/MPI ---------------------------
impi/2017.4.196

------------------------ /modules/node/common/Programs -------------------------
bamtools/2.5.1             ffmpeg/4.4
bedtools/2.31.0            gaussian/16B.01-avx2
bowtie/1.2.2               gaussian/16C.01-avx2
bowtie2/2.3.4.3            gaussian/16C.01-LINDA-avx2
bowtie2/2.5.1              nco/4.7.6
cmake/3.15.3               nco/4.9.3
cmake/3.24.2(default)      openssl/1.0.2k
cmake/3.9.1                openssl/3.0.9
fastx-toolkit/0.0.14       salmon/0.12.0
ffmpeg/3.3.3               salmon/1.1.0

------------------------ /modules/node/common/Libraries ------------------------
isl/0.22              mpfr/4.0.1            trimmomatic/0.39
lapack/3.7.1          ncurses/5.9           x264/20171213
libjpeg-turbo/2.1.5.1 newsetup/0.1          zstd/1.5.5
libpng/1.5.30         openssl/1.0.2k
mkl/2017.0.3          openssl/3.0.9

------------------------ /modules/node/common/Languages ------------------------
cuda-toolkit/10.1.243(default) gcc/6.4.0
cuda-toolkit/11.6.2            gcc/7.3.0
cuda-toolkit/8.0.61            gcc/8.3.0
gcc/11.4.0                     gcc/9.2.0
gcc/12.3.0                     intel/2017.4.196
gcc/13.2.0

----------------------------- /modules/node/Types ------------------------------
centos-7-63/0.1   centos-7-79/0.1   rocky-9.3-143/0.1 thisnode/0.1

Common modules that work on both Rocky Linux 9 and CentOS 7.

Rocky Linux Modules

user $module load rocky-9.3-143
user $module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

--------------------------- /modules/node/common/MPI ---------------------------
impi/2017.4.196

------------------------ /modules/node/common/Programs -------------------------
bamtools/2.5.1             ffmpeg/4.4
bedtools/2.31.0            gaussian/16B.01-avx2
bowtie/1.2.2               gaussian/16C.01-avx2
bowtie2/2.3.4.3            gaussian/16C.01-LINDA-avx2
bowtie2/2.5.1              nco/4.7.6
cmake/3.15.3               nco/4.9.3
cmake/3.24.2(default)      openssl/1.0.2k
cmake/3.9.1                openssl/3.0.9
fastx-toolkit/0.0.14       salmon/0.12.0
ffmpeg/3.3.3               salmon/1.1.0

------------------------ /modules/node/common/Libraries ------------------------
isl/0.22              mpfr/4.0.1            trimmomatic/0.39
lapack/3.7.1          ncurses/5.9           x264/20171213
libjpeg-turbo/2.1.5.1 newsetup/0.1          zstd/1.5.5
libpng/1.5.30         openssl/1.0.2k
mkl/2017.0.3          openssl/3.0.9

------------------------ /modules/node/common/Languages ------------------------
cuda-toolkit/10.1.243(default) gcc/6.4.0
cuda-toolkit/11.6.2            gcc/7.3.0
cuda-toolkit/8.0.61            gcc/8.3.0
gcc/11.4.0                     gcc/9.2.0
gcc/12.3.0                     intel/2017.4.196
gcc/13.2.0

----------------------------- /modules/node/Types ------------------------------
centos-7-63/0.1   centos-7-79/0.1   rocky-9.3-143/0.1 thisnode/0.1

----------------------- /modules/node/rocky-9.3-143/MPI ------------------------
openmpi-gcc/4.1.5

--------------------- /modules/node/rocky-9.3-143/Programs ---------------------
lammps/20220623

-------------------- /modules/node/rocky-9.3-143/Libraries ---------------------
hdf5/1.12.1    hdf5/1.8.19    netcdf/4.4.1.1 netcdf/4.8.1

Most Used Modules

For installing modules, I am prioritizing those used 10 or more times over the last 90 days. If you require a module not on this list, please make a request to those listed in the contacts

All nodes over last 90 days:

Module         Count
hdf5           2089
netcdf         2079
intel          1381
mkl            1317
lammps         1294
mpfr           1225
gcc            1222
libpng         1109
isl            1057
x264           804
ffmpeg         792
esmf           749
MCT            744
scrip-coawst   742
fftw           737
nco            683
ncview         676
gaussian       645
python         433
cuda-toolkit   317
bedtools       258
samtools       258
openmpi        221
bowtie         203
R              165
orca           150
fastx-toolkit  146
digimat        109
icu            96
readline       93
ncurses        88
matlab         69
impi           56
cmake          54
openmpi-gcc    52
lapack         45
salmon         34
git            27
trimmomatic    20
newsetup       18
gmake          14
bowtie2        13
lammps-tools   13
mopac          13
gromacs        10
STAR           10