Submitting jobs and compiling programs on Magnolia cluster
Description
Topics in this workshop:
- Introduction to Magnolia cluster
 - Using Modules to select programs
 - Submitting/running jobs using SLURM scheduler
 - Compiling MPI programs
 - Compiling CUDA programs
 
Questions? Email Brian Olson
Workshop Notes
The following is a a summary of topics covered in the workshop.
Check hostname
Connecting to magnolia.usm.edu will result in access to one of two (2) login nodes, magnolia01 or magnolia02, chosen randomly each time a connection is requested. The hostname command will show the host name of the system:
user $hostnamemagnolia01
Create a Script/Program
To create a very simply script for bash, load up a next editor. To use the nano text editor, type:
user $nano myscript.shType the following, and save it.
myscript.shFirst script#!/bin/sh hostname
Mark as Executable, and Run
The script has been created, however it is currently a plain text file; to mark the script as executable, the chmod command is used.
user $chmod a+x myscript.shuser $./myscript.shmagnolia01
Comments
Bash scripts allow for comments to be placed within the files by preceding them with a '#'
myscript.shFirst script with a comment#!/bin/sh # This is my first script, and a comment. hostname
The script has previously been marked as executable, so it can be run directly.
user $./myscript.shmagnolia01
The output is the same as before, since comments are ignored by Bash.
Slurm
The Slurm Workload Manager is used for submitting and monitoring jobs to the Magnolia cluster.
Partitions
Nodes of the Magnolia cluster is separated into partitions. These partitions can be viewed with the sinfo command.
user $sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELIST node* up infinite 69 alloc node[001-026,032-036,038-041,043-076] node* up infinite 7 idle node[027-031,037,042] gpu up infinite 2 idle gpu[001-002] himem up infinite 4 idle himem[001-004] phi up infinite 4 idle phi[001-004]
Job Information
The squeue command is used show infomation of jobs currently in the Slurm queue.
user $squeue   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    2151      node concorde     cpan  R 2-06:33:11     23 node[001-011,016,018,020,032-036,038-041]
    2162      node concorde     cpan  R 1-14:58:40     23 node[011-015,043-060]
    2208      node concorde     cpan  R    4:51:02     24 node[017,019,021-026,061-076]Submit a Job
To submit a job, the sbatch command is used. The script made previously can be sent to the cluster for running:
user $sbatch myscript.shSubmitted batch job 2210
When the job is finished, the output will by default will be placed in a slurm-####.out file, where '####' is the batch job number shown after the script was submitted with sbatch.
user $cat slurm-2210.outnode028.cluster
user $sbatch -n 2 -N 2 myscript.shSubmitted batch job 2211
user $cat slurm-2211.outnode028.cluster
myscript.shFirst script with a comment#!/bin/sh # This is my first script srun hostname
user $sbatch -n 2 -N 2 myscript.shSubmitted batch job 2213
user $cat slurm-2213.outnode028.cluster node037.cluster
myscript.shFirst script with a comment#!/bin/sh #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH --mail-user=myemail@example.com ##SBATCH --mail-type=END #SBATCH --time=0-00:10:00 #SBATCH --job-name=hostname # This is my first script srun hostname
user $sbatch myscript.shSubmitted batch job 2214
user $cat slurm-2214.outnode028.cluster node028.cluster node037.cluster node037.cluster
Modules
user $module avail------------------------ /usr/share/Modules/modulefiles ------------------------ dot module-git module-info modules null use.own ------------------------------- /act/modulefiles ------------------------------- impi mpich/intel openmpi-1.6/gcc openmpi-1.8/intel intel mvapich2-2.2/gcc openmpi-1.6/intel openmpi-2.0/gcc mpich/gcc mvapich2-2.2/intel openmpi-1.8/gcc openmpi-2.0/intel ----------------------------- /modules/modulefiles ----------------------------- atlas/3.10.3 hdf5/1.8.19 mkl/2017.0.3 python/3.6.2 cmake/3.9.1 lammps/20170811 molpro/2012.1.52 qe/6.0 ffmpeg/3.3.3 lapack/3.7.1 netcdf/4.4.1.1 qe/6.1 fftw/3.3.6 libxc/4.0.1 python/3.5.4 scalapack/2.0.2
user $python3.5 --versionbash: python3.5: command not found...
user $module load python/3.5.4user $python3.5 --versionPython 3.5.4>
user $module help python/3.5.4----------- Module Specific Help for 'python/3.5.4' --------------- Description - Python is a widely used high-level programming language for general-purpose programming. Docs - https://www.python.org/
user $module unload python/3.5.4