Update 202405: Difference between revisions

From HPC Wiki
Jump to:navigation Jump to:search
No edit summary
No edit summary
Line 10: Line 10:
! Topic !! Completed? !! Description
! Topic !! Completed? !! Description
| Install new compute node hardware - Part I ||{{Yes}} || Vendor Installed
| Install new compute node hardware - Part I ||{{Yes}}
| Vendor Installed
* 6 Compute nodes, each with:
** 2 Intel(R) Xeon(R) Gold 6448H Processors
** 512 MiB RAM
* 4 High memory compute nodes, each with:
** 2 Intel(R) Xeon(R) Gold 6426Y Processors
** 2 TiB RAM
* 3 GPU compute nodes, each with:
** 2 Intel(R) Xeon(R) Gold 6426Y Processors
** 256 GiB RAM
** 2 NVIDIA A100 80GB PCIe GPUs
| Install new compute node hardware - Part II|| {{No}} || To be shipped
| Install new compute node hardware - Part II|| {{No}} || To be shipped
Line 24: Line 35:
| Add new compute nodes to Slurm partitions || {{Yes}}
| Add new compute nodes to Slurm partitions || {{Yes}}
| The first two partitions are currently only available to those who purchased the nodes:
| The first two partitions are currently only available to those who purchased the nodes:
* Kisame
* kisame
* suliaoma
* suliaoma
* himem
* himem

Revision as of 10:18, 28 May 2024

This page is a work in progress by bgo (talk | contribs). Treat its contents with caution.

Update to Rocky Linux 9. CentOS discontinued and CentOS 7 reaching end of life (EOL) June 30, 2024.


Topic Completed? Description
Install new compute node hardware - Part I Yes Vendor Installed
  • 6 Compute nodes, each with:
    • 2 Intel(R) Xeon(R) Gold 6448H Processors
    • 512 MiB RAM
  • 4 High memory compute nodes, each with:
    • 2 Intel(R) Xeon(R) Gold 6426Y Processors
    • 2 TiB RAM
  • 3 GPU compute nodes, each with:
    • 2 Intel(R) Xeon(R) Gold 6426Y Processors
    • 256 GiB RAM
    • 2 NVIDIA A100 80GB PCIe GPUs
Install new compute node hardware - Part II No To be shipped
Install operating system on new compute nodes Yes Using Rocky Linux 9 since CentOS 7 reaching end of life (EOL) on June 30, 2024. Took a lot of time getting the compute nodes with the new Rocky Linux operating system to integrate with the much older CentOS operating system nodes; Finally done.
Update backup and cloning system Yes The backup and cloning software that has been used on the cluster does not work with Rocky Linux, so had to design a package that will work with both Rocky Linux and CentOS.
Update Slurm Yes Updated to a version which works on both Rocky Linux 9 and CentOS 7. Required very short downtime.
Add new compute nodes to Slurm partitions Yes The first two partitions are currently only available to those who purchased the nodes:
  • kisame
  • suliaoma
  • himem

All partitions will be available to user when preemption (see below) is enabled.

Update module setup No WIP
Update modules for Rocky Linux 9 No WIP
Install CUDA drivers for new GPU compute nodes Yes
Preemption partitions No TODO
Update CentOS 7 compute nodes to Rocky Linux 9 No TODO
Update login nodes to Rocky Linux 9 No TODO

Slurm Partitions

user $coresavail
Number of nodes in partition with N available cores and RAM.

Nodes  Partition  Available Cores  Available RAM (MiB)
6      kisame     64               515134
3      suliaoma   32               257094
4      himem      32               2063430
62     node       20               128000
4      himem      20               512000
2      gpu        20               128000
17     lomem      16               64000

The himem partition has 4 compute nodes with 32 cores available and just under 2TiB RAM free. There are another 4 himem compute nodes free, but these have 20 cores and 50.5 TiB RAM free.


Updating to new modules layout to allow for easier upgrades in the future.

New Layout

user $module load newsetup
user $module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

--------------------------- /modules/node/common/MPI ---------------------------

------------------------ /modules/node/common/Programs -------------------------
bamtools/2.5.1             ffmpeg/4.4
bedtools/2.31.0            gaussian/16B.01-avx2
bowtie/1.2.2               gaussian/16C.01-avx2
bowtie2/            gaussian/16C.01-LINDA-avx2
bowtie2/2.5.1              nco/4.7.6
cmake/3.15.3               nco/4.9.3
cmake/3.24.2(default)      openssl/1.0.2k
cmake/3.9.1                openssl/3.0.9
fastx-toolkit/0.0.14       salmon/0.12.0
ffmpeg/3.3.3               salmon/1.1.0

------------------------ /modules/node/common/Libraries ------------------------
isl/0.22              mpfr/4.0.1            trimmomatic/0.39
lapack/3.7.1          ncurses/5.9           x264/20171213
libjpeg-turbo/ newsetup/0.1          zstd/1.5.5
libpng/1.5.30         openssl/1.0.2k
mkl/2017.0.3          openssl/3.0.9

------------------------ /modules/node/common/Languages ------------------------
cuda-toolkit/10.1.243(default) gcc/6.4.0
cuda-toolkit/11.6.2            gcc/7.3.0
cuda-toolkit/8.0.61            gcc/8.3.0
gcc/11.4.0                     gcc/9.2.0
gcc/12.3.0                     intel/2017.4.196

----------------------------- /modules/node/Types ------------------------------
centos-7-63/0.1   centos-7-79/0.1   rocky-9.3-143/0.1 thisnode/0.1

Common modules that work on both Rocky Linux 9 and CentOS 7.

Rocky Linux Modules

user $module load rocky-9.3-143
user $module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
dot         module-git  module-info modules     null        use.own

--------------------------- /modules/node/common/MPI ---------------------------

------------------------ /modules/node/common/Programs -------------------------
bamtools/2.5.1             ffmpeg/4.4
bedtools/2.31.0            gaussian/16B.01-avx2
bowtie/1.2.2               gaussian/16C.01-avx2
bowtie2/            gaussian/16C.01-LINDA-avx2
bowtie2/2.5.1              nco/4.7.6
cmake/3.15.3               nco/4.9.3
cmake/3.24.2(default)      openssl/1.0.2k
cmake/3.9.1                openssl/3.0.9
fastx-toolkit/0.0.14       salmon/0.12.0
ffmpeg/3.3.3               salmon/1.1.0

------------------------ /modules/node/common/Libraries ------------------------
isl/0.22              mpfr/4.0.1            trimmomatic/0.39
lapack/3.7.1          ncurses/5.9           x264/20171213
libjpeg-turbo/ newsetup/0.1          zstd/1.5.5
libpng/1.5.30         openssl/1.0.2k
mkl/2017.0.3          openssl/3.0.9

------------------------ /modules/node/common/Languages ------------------------
cuda-toolkit/10.1.243(default) gcc/6.4.0
cuda-toolkit/11.6.2            gcc/7.3.0
cuda-toolkit/8.0.61            gcc/8.3.0
gcc/11.4.0                     gcc/9.2.0
gcc/12.3.0                     intel/2017.4.196

----------------------------- /modules/node/Types ------------------------------
centos-7-63/0.1   centos-7-79/0.1   rocky-9.3-143/0.1 thisnode/0.1

----------------------- /modules/node/rocky-9.3-143/MPI ------------------------

--------------------- /modules/node/rocky-9.3-143/Programs ---------------------

-------------------- /modules/node/rocky-9.3-143/Libraries ---------------------
hdf5/1.12.1    hdf5/1.8.19    netcdf/ netcdf/4.8.1

Most Used Modules

Modules used 10 or more times over the last 90 days:

All nodes over last 90 days:

Module         Count
hdf5           2089
netcdf         2079
intel          1381
mkl            1317
lammps         1294
mpfr           1225
gcc            1222
libpng         1109
isl            1057
x264           804
ffmpeg         792
esmf           749
MCT            744
scrip-coawst   742
fftw           737
nco            683
ncview         676
gaussian       645
python         433
cuda-toolkit   317
bedtools       258
samtools       258
openmpi        221
bowtie         203
R              165
orca           150
fastx-toolkit  146
digimat        109
icu            96
readline       93
ncurses        88
matlab         69
impi           56
cmake          54
openmpi-gcc    52
lapack         45
salmon         34
git            27
trimmomatic    20
newsetup       18
gmake          14
bowtie2        13
lammps-tools   13
mopac          13
gromacs        10
STAR           10