Submitting jobs and compiling programs on Magnolia cluster: Difference between revisions
(removed old registration link) |
No edit summary |
||
Line 1: | Line 1: | ||
{| | {| | ||
|- | |- | ||
| | | |
Latest revision as of 09:16, 21 September 2018
Description
Topics in this workshop:
- Introduction to Magnolia cluster
- Using Modules to select programs
- Submitting/running jobs using SLURM scheduler
- Compiling MPI programs
- Compiling CUDA programs
Questions? Email Brian Olson
Workshop Notes
The following is a a summary of topics covered in the workshop.
Check hostname
Connecting to magnolia.usm.edu will result in access to one of two (2) login nodes, magnolia01 or magnolia02, chosen randomly each time a connection is requested. The hostname command will show the host name of the system:
user $
hostname
magnolia01
Create a Script/Program
To create a very simply script for bash, load up a next editor. To use the nano text editor, type:
user $
nano myscript.sh
Type the following, and save it.
#!/bin/sh hostname
Mark as Executable, and Run
The script has been created, however it is currently a plain text file; to mark the script as executable, the chmod command is used.
user $
chmod a+x myscript.sh
user $
./myscript.sh
magnolia01
Comments
Bash scripts allow for comments to be placed within the files by preceding them with a '#'
#!/bin/sh # This is my first script, and a comment. hostname
The script has previously been marked as executable, so it can be run directly.
user $
./myscript.sh
magnolia01
The output is the same as before, since comments are ignored by Bash.
Slurm
The Slurm Workload Manager is used for submitting and monitoring jobs to the Magnolia cluster.
Partitions
Nodes of the Magnolia cluster is separated into partitions. These partitions can be viewed with the sinfo command.
user $
sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST node* up infinite 69 alloc node[001-026,032-036,038-041,043-076] node* up infinite 7 idle node[027-031,037,042] gpu up infinite 2 idle gpu[001-002] himem up infinite 4 idle himem[001-004] phi up infinite 4 idle phi[001-004]
Job Information
The squeue command is used show infomation of jobs currently in the Slurm queue.
user $
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2151 node concorde cpan R 2-06:33:11 23 node[001-011,016,018,020,032-036,038-041] 2162 node concorde cpan R 1-14:58:40 23 node[011-015,043-060] 2208 node concorde cpan R 4:51:02 24 node[017,019,021-026,061-076]
Submit a Job
To submit a job, the sbatch command is used. The script made previously can be sent to the cluster for running:
user $
sbatch myscript.sh
Submitted batch job 2210
When the job is finished, the output will by default will be placed in a slurm-####.out file, where '####' is the batch job number shown after the script was submitted with sbatch.
user $
cat slurm-2210.out
node028.cluster
user $
sbatch -n 2 -N 2 myscript.sh
Submitted batch job 2211
user $
cat slurm-2211.out
node028.cluster
#!/bin/sh # This is my first script srun hostname
user $
sbatch -n 2 -N 2 myscript.sh
Submitted batch job 2213
user $
cat slurm-2213.out
node028.cluster node037.cluster
#!/bin/sh #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH --mail-user=myemail@example.com ##SBATCH --mail-type=END #SBATCH --time=0-00:10:00 #SBATCH --job-name=hostname # This is my first script srun hostname
user $
sbatch myscript.sh
Submitted batch job 2214
user $
cat slurm-2214.out
node028.cluster node028.cluster node037.cluster node037.cluster
Modules
user $
module avail
------------------------ /usr/share/Modules/modulefiles ------------------------ dot module-git module-info modules null use.own ------------------------------- /act/modulefiles ------------------------------- impi mpich/intel openmpi-1.6/gcc openmpi-1.8/intel intel mvapich2-2.2/gcc openmpi-1.6/intel openmpi-2.0/gcc mpich/gcc mvapich2-2.2/intel openmpi-1.8/gcc openmpi-2.0/intel ----------------------------- /modules/modulefiles ----------------------------- atlas/3.10.3 hdf5/1.8.19 mkl/2017.0.3 python/3.6.2 cmake/3.9.1 lammps/20170811 molpro/2012.1.52 qe/6.0 ffmpeg/3.3.3 lapack/3.7.1 netcdf/4.4.1.1 qe/6.1 fftw/3.3.6 libxc/4.0.1 python/3.5.4 scalapack/2.0.2
user $
python3.5 --version
bash: python3.5: command not found...
user $
module load python/3.5.4
user $
python3.5 --version
Python 3.5.4>
user $
module help python/3.5.4
----------- Module Specific Help for 'python/3.5.4' --------------- Description - Python is a widely used high-level programming language for general-purpose programming. Docs - https://www.python.org/
user $
module unload python/3.5.4