Application examples¶
Best practices¶
Use your project directory instead of the home directory
The HOME directory has a limited storage space (~25 GB). Your project directory
/proj/nobackup/hpc2n202X-XYZ
has a much larger space.
Create a soft-link to your storage project
It will be very convinient to create a soft-link to your storage project in your home directory for a faster navigation:
Monitoring the use of resources
Most likely you will allocate many cores and many GPUs for your simulations. You can
monitor the use of these resources with the job-usage job_ID
command, where job_ID
is the output number of the sbatch
command. You can also see this number if you type
squeue -u my-username
. job-usage
outputs a url that you can copy/paste in your
local browser where you can see how resources are being used:
Matlab¶
How to find Matlab¶
Matlab is available through the Menu bar if you are using ThinLinc client (recommended). Additionally, you can load a Matlab module on a Linux terminal on Kebnekaise. Details for these two options can be found here.
First time configuration¶
The first time you access Matlab on Kebnekaise, you need to configure it by following these guidelines Configuring Matlab. After configuring the cluster, it is a good practice to validate the cluster (HOME -> Parallel -> Create and Manage Clusters):
Notice that it is recommended to use a small number of workers for the validation, in this case 4.
Tools for efficient simulations¶
Chart flow for a more efficient Matlab code using existing tools adapted from Mathworks documentation on parallel computing:
MATLAB on GPUs
Notice that MATLAB currently supports only NVIDIA GPUs (v100,a40,a6000,a100,l40s,h100), with v100 and l40s being the most abundant (10 nodes each).
Use MATLAB for lightweight tasks on the login nodes
Remember that login nodes are used by many users and if you run heavy jobs there, you will interfere with the workflow of them.
Exercises¶
Exercise 1: Matlab serial job
The folder SERIAL
contains a function funct.m
which performs a FFT on a matrix.
The execution time is obtained with tic/toc and written down in the output file called
log.out. Run the function by using the MATLAB GUI with the help of the script submit.m.
As an alternative, you can submit the job via a batch script job.sh. Here, you will need to fix the Project_ID with the one provided for the present course and the Matlab version.
Exercise 2: Matlab parallel job
-
PARFOR
folder contains an example of a parallelized loop with the “parfor” directive. A pause() function is included in the loop to make it heavy. This function can be submitted to the queue by running the script submit.m in the MATLAB GUI. The number of workers can be set by replacing the string FIXME (in the “submit.m” file) with the number you desire. Try different values for the number of workers from 1 to 10 and take a note of the simulation time output at the end of the simulation. Where does the code achieve its peak performance? -
SPMD
folder presents an example of a parallelized code using SPMD paradigm. Submit this job to the queue through the MATLAB GUI. This example illustrates the use of parpool to run parallel code in a more interactive manner.
Exercise 3: Matlab GPU job
GPU
folder contains a test case that computes a Mandelbrot set both
on CPU mandelcpu.m
and on GPU mandelgpu.m. You can submit the jobs through
the MATLAB GUI using the submitcpu.m and submitgpu.m files.
The final output if everything ran well are two .png figures which display the timings for both architectures. Use the “eom” command on the terminal to visualize the images (eom out-X.png)
R¶
How to find R¶
Similar to Matlab, R is available through the Menu bar if you are using ThinLinc client (recommended). Additionally, you can load a Matlab module on a Linux terminal on Kebnekaise. Details for these two options can be found here.
First time configuration¶
The first time you access R on Kebnekaise, you need to configure it by following the Preparations step.
Recommendations¶
Be aware of data duplication in R
Some parallel functions mcapply
in this example, tend to replicate the data for
the workers (cores) if the dataframe is modified by them. This can be crucial if you
are working with a large data frame and you are employing several parallel functions,
for instance during the training of machine learning models because your simulation could
easily exceed the available memory per node.
library(parallel)
library(pryr)
prev <- mem_used()
print(paste("Memory initially allocated by R:", prev/1e6, "MB"))
# Define a relatively large dataframe
data_df <- data.frame(
ID = seq(1, 1e7),
Value = runif(1e7)
)
# Create a function to be applied to each row (or a subset of rows)
process_function <- function(i, df) {
# do some modification the i-th row
return(df$Value[i] * 2)
}
prev <- mem_used() - prev
print(paste("Memory after the serial code execution:", prev/1e6, "MB"))
# Use mclapply to process the dataframe in parallel
num_cores <- 4
results <- mclapply(1:nrow(data_df), function(i) process_function(i, data_df), mc.cores = num_cores)
prev <- mem_used() - prev
print(paste("Memory after parallel code execution:", prev/1e6, "MB"))
In this example mem-dup.R, I used the function mem_used()
provided by the pryr
package
to monitor the memory usage. The batch script for this example is job.sh.
One possible solution for data duplication could be to use use a data frame for each worker that includes only the relevant data for that particular computation.
Use R for lightweight tasks on the login nodes
Remember that login nodes are used by many users and if you run heavy jobs there, you will interfere with the workflow of them.
Exercises¶
Requirements
Prior to running the examples, you will need to install several packages. Follow these instructions:
-
The packages needed are:
For this R version (check if they are not already installed)
ml GCC/10.2.0 OpenMPI/4.0.5 R/4.0.4
Rmpi
doParallel
caret
MASS
klaR
nnet
e1071
rpart
mlbench
parallel
Exercise 1: R serial job
In the SERIAL
folder, a serial is provided. Submit the script
job.sh with the command R CMD and also with Rscript. Where could
it be more suitable to use Rscript over R CMD?
Why do we need the flag #SBATCH -C ‘skylake’ in the batch script?
Exercise 2: Job Arrays
JOB-ARRAYS
folder shows an example for job arrays, the batch file is job.sh. Submit the
script and notice what is written in the output files.
Could you use job arrays in your simulations if you need to run many simulations where some parameters are changed? As an example, imagine that you need to run 28 simulations where a single parameter, such as the temperature, is changed from 2 to 56 C. Could you use the variable task_id in the previous script to get that range of temperatures so that each simulation prints out a different temperature?
Exercise 3: Parallel jobs with Rmpi
In the folder RMPI
, you can find the R script Rmpi.R which uses 5
MPI slaves to apply the runif() function on an array “c”. The submit file is
job_Rmpi.sh. As a result, you will see the random numbers
generated by the slaves in the slurm output file
Exercise 4: Parallel jobs with doParallel
The folder DOPARALLEL
contains two examples:
-
doParallel.R shows how to use the foreach function in sequential mode (1 core) and the parallel mode using 4 cores. What is the difference in the usage of foreach for these two modes?
Submit the job_doParallel.sh script and compare the timings of the sequential and parallel codes.
How many workers are allocated for this simulation? If you want to allocate more or less, what changes must be made to these files?
-
doParallel_ML.R presents the evaluation of several ML models in both sequential and parallel modes using the standard “iris” database. The difference is basically in the use of %dopar% instead of %do% function.
Submit the batch script job_doParallel_ML.sh to the queue.
In the output file observe the resulting elapsed times for the sequential and the 4 cores parallel simulation.
Upon submitting the job to the queue you will get a number called job ID. Use the command:
job-usage job_ID
to obtain a URL which you can copy/paste in your local browser. Tip: refresh your browser several times to get the statistics.
Can you see how the CPU is used? What about the memory?
Note 1: In order to run this exercise, you need to have all the packages listed at the beginning of this document installed.
Note 2: If you want to try a different number of cores for running the scripts, you should change that number in both the .R and .sh scripts
Exercise 5: Machine Learning jobs
In the folder ML
we show a ML model using a sonar database
and Random Forest as the training method (Rscript.R). The simulations are done both in serial
and parallel modes. You may change the values for the number of cores (1 in the present case)
to other values. Notice that the number of cores needs to be the same in the
files job.sh and Rscript.R.
Try a different number of cores and monitor the timings which are reported at the end of the output file.
Alphafold¶
How to find Alphafold¶
Alphafold is installed as a module. Notice that on the Intel nodes there are more
versions of Alphafold installed than on the AMD nodes. Thus, if you are targeting one
version that is only installed on the Intel nodes, you will need to add the instruction
#SBATCH -C skylake
to your batch script, otherwise the job could arrive to an
AMD node that lacks that installation.
Exercises¶
Exercise 1: Running a monomer protein simulation
In the exercises folder ALPHAFOLD
you will find a fasta secuence for a monomer and the
corresponding batch file job.sh for running the simulation on
GPUs. Try running the simulation with CPUs only and then with l40s, v100 and a100 GPUs.
Notice that the simulation will take ~1hrs. so the purpose of this exercise is to know if the simulation starts running well only.
CryoSPARC¶
How to find CryoSPARC¶
The version 4.5.3 of CryoSPARC is installed as a module.
First time configuration¶
One needs a license for using this software. For
academic purposes a free of charge license can be requested at the website
cryosparc.com (one working day for the processing).
Once you obtain your license ID copy it, create a file called /home/u/username/.cryosparc-license
and paste
it in the first line of this file. In the second line of the file write your email address.
Using CryoSPARC on Kebnekaise¶
Create a suitable folder in your project directory, for instance /proj/nobackup/hpc2n202X-XYZ/cryosparc
and move into this folder. Download/copy the lane*tar
files that are located here to the cryosparc folder and untar them here (tar -xvf lane_CPU.tar
as an example).
Fix your Project_ID and time
Change the string Project_ID in the file lane*/cluster_script.sh
to reflect your current project.
Also, the time was set to 20 min. in these files but for your realistic simulations you can change it to
longer times (-t 00:20:00
).
The lanes should be recognized by CryoSPARC when it starts running.
Load the CryoSPARC modules. Start CryoSPARC and accept the request which asks about continuing using cryostart and that the folder was not used before. List the users on the server (which should be only yourself for this type of license), check the email address that is displayed for this user (it should be the one you added in the license file) and reset the password to. These steps are summarized here:
$cryosparc start
...
Do you wish to continue starting cryosparc? [yN]: y
...
CryoSPARC master started.
From this machine, access CryoSPARC and CryoSPARC Live at
http://localhost:39007
...
$cryosparc listusers
cryosparc resetpassword --email "myemail@mail.com" --password "choose-a-password"
Copy and paste the line which has the localhost port (notice that port number can change) to a browser on Kebnekaise:
After loging in, you will be able to see the CryoSPARC’s dashboard:
There are several tutorials at the CryoSPARC website, in the previous picture I followed the Introductory Tutorial (v4.0+).
Use cryosparc
instead of cryosparcm
On Kebnekaise the command cryosparc
should be used and not the one cited in the tutorial cryosparcm
Depending on the job type, CryoSPARC would suggest the hardware resources. For instance, in the tutorial above Step 4: Import Movies suggests using 1 CPU upon queueing it, but Step 5: Motion Correction suggests using 1 GPU. For CPU-only jobs you can choose the CPU lane, and if your job uses GPUs you can choose among L40s, V100, A100, and H100. Notice that the V100 and L40s are the most abundant at the moment:
When you finish your analysis with CryoSPARC, shut it down with the command cryosparc stop
on the terminal.
Otherwise the server keeps running on the login node.
Additional information can be obtained from a tutorial given during a workshop on Berzelius
and also
from the NSC documentation. Notice that although the guidelines are for machines different to Kebnekaise,
the systems are very similar and you could get ideas from them. For instance, the cryosparc copylanes
is not
supported on Kebnekaise and you will need to follow the step above (manually copying the lanes) for getting lanes working.
Nextflow¶
How to find Nextflow¶
Nextflow is installed as a module that can be loaded directly without any requirements.
Notice that on the Intel nodes there are more versions of this software installed
than on the AMD nodes. Thus, if you are targeting one
version that is only installed on the Intel nodes, you will need to add the instruction
#SBATCH -C skylake
to your batch script, otherwise the job could arrive to an
AMD node that lacks that installation.
Exercises¶
Exercise 1: Arabidopsis
The data for running this example can be found in this paper and more details about the analysis can be found there as well. We have downloaded the data for you and you can get it by copying the files to your working project:
$cd /proj/nobackup/your-project
$mkdir nextflow-arabidopsis
$cd nextflow-arabidopsis
$cp /proj/nobackup/hpc2n/SR*gz
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/ARABIDOPSIS/design_test.csv
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/ARABIDOPSIS/job.sh
Fix the Project_ID to match the current project you are part of and send the job to the queue. This example takes ~3 hrs. so the purpose of this exercise is just to show you how to run this job with Nextflow.
Exercise 2: Interactive job submission
Nextflow allows you to submit jobs interactively on the Kebnekaise’s command line. You need to write a file with the instructions to be executed by Nextflow, in the present case, it is a file wc.nf which unzips a file file.txt.gz and counts the number of lines in it. A configuration file for the cluster hpc2n.config is needed with some parameters that need to be changed with your personal information. Similarly to the previous exercise, you can follow these commands:
$cd /proj/nobackup/your-project
$mkdir nextflow-interactive
$cd nextflow-interactive
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/INTERACTIVE/wc.nf
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/INTERACTIVE/file.txt.gz
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/INTERACTIVE/hpc2n.config
load the Nextflow module and send the job interactively by typing the command on the Kebnekaise’s terminal (fix the project ID):
$ml Nextflow/24.04.2
$nextflow run wc.nf -c hpc2n.config --input file.txt.gz --project hpc2n202X-XYZ --clusterOptions "-t 00:05:00 -n 28 -N 1"
Here, you will run the job on 28 cores. On a different terminal tab you can check that the job is submitted/running with the command squeue -u your-username
.
Apptainer¶
How to find Apptainer¶
Apptainer is site-installed meaning that you can run it without loading a module. Apptainer is supported on
Kebnekaise instead of Singularity. The recipes that are built/run with Singularity can also be built/run with
Apptainer with the same parameters. You will need to replace the command singularity
by apptainer
.
If you are curious, you will notice that the command singularity
is also available on Kebnekaise but it is just
a soft-link to apptainer
:
$which singularity
/bin/singularity
$ls -lahrt /bin/singularity
lrwxrwxrwx 1 root root 9 Mar 14 18:30 /bin/singularity -> apptainer
Use R for lightweight tasks on the login nodes
As with any other software, use Apptainer on the login node for simple tasks, for instance building a lightweight image, otherwise run a batch job.
Exercises¶
Exercise 1: Building and running an Apptainer image
This is an example for building a software called Gromacs. Build a Gromacs container as follows in the directory which contains the gromacs.def definition file:
Download the benchMEM.tpr file here and place it in the directory where the .sif is generated. In fact you can place the files at any other location but then you will need to modify the paths in the job.sh batch script.
Submit the job.sh file to the queue. The output of Gromacs including its performance at the bottom of it (line with the ns/day string) is written in the md.log files. As a comparison, after running the Apptainer image, the module of Gromacs is loaded and the same simulation is run.
TensorFlow¶
How to find TensorFlow¶
Several versions of TensorFlow are installed as modules on Kebnekaise. Similarly to other software, on Intel nodes there are more versions of this software installed than on the AMD nodes.
Exercises¶
Exercise 1: Running TensorFlow simulations
In this exercise, you will run a script with TensorFlow v. 2.15 on GPUs. Notice that
because this version of TensorFlow is available on all the NVIDIA GPUs, you just need
to write the type of GPUs you want to use, in the present case l40s. There are
three different examples in the TENSORFLOW
folder under the exercises one:
hello_tensorflow.py (prints out Hello, TensorFlow! string),
loss.py (it computes a loss in a model), and
mnist_mlp.py (which runs a model using the MNIST database).
The batch script is job.sh. Submit the job with different types of GPUs.
Jupyter Notebooks¶
You can use Jupyter Notebooks on Kebnekaise through JupyterLab. Jupyter Notebooks allow you to work in a more interactive manner which is convenient when you are at the development phase of your project. There are available kernels for most popular languages: R, Python, Matlab, and Julia to work in a Jupyter Notebook.
How to find JupyterLab¶
Several versions of JupyterLab are installed as modules on Kebnekaise. Similarly to other software, on Intel nodes there are more versions of this software installed than on the AMD nodes.
Using Jupyter Notebooks on Kebnekaise¶
Guidelines for running Jupyter Notebooks on Kebnekaise can be found here.
Exercises¶
Exercise 1: Running a Jupyter Notebook
Because the tasks executed in a Jupyter Notebook are, in general, computationally expensive it is more convenient to run them on a compute node instead of the login nodes. To do this, you need to prepare a batch script like this one job.sh.
Once you submit your job and it starts running, check the output file slurm*out and search for the string http://b-cnwxyz.hpc2n.umu.se:8888/lab?token=xy…z. Copy this string and paste it in a browser on Kebnekaise. You will be directed to the dashboard of JupyterLab.
A couple of notes:
-
You can change the type of the GPU where you want to run the notebook
-
Cancel the job (
scancel job_ID
) if you stop using the notebook
Exercise 2: Running Infomap in a Jupyter Notebook
Infomap is a software for network community detection. It could be convenient for you to work in a Jupyter Notebook if the simulations are not long and you need to see the graphical results right away. Here, there are the steps you can follow to get Infomap running on a notebook:
# Create a suitable folder in your project and move into it
$mkdir /proj/nobackup/hpc2n202Q-XYZ/infomap-workspace
$cd /proj/nobackup/hpc2n202Q-XYZ/infomap-workspace
# Purge and load JupyterLab module and dependencies
$module purge
$module load GCCcore/13.2.0 JupyterLab/4.2.0
# Create a isolated environment for this project called "infmpenv" and activate it
$python -m venv ./infmpenv
$source infmpenv/bin/activate
# Install ipykernel to be able to create your own kernel for this environment
$pip install --no-cache-dir --no-build-isolation ipykernel
# Install Infomap, Networkx, and Matplotlib
$pip install --no-cache-dir infomap networkx matplotlib
# Install the kernel
$python -m ipykernel install --user --name=infmpenv
After doing these installations, download the Jupyter Notebook for Infomap, create a data and output folders as follows:
$wget https://raw.githubusercontent.com/mapequation/infomap-notebooks/master/1_1_infomap_intro.ipynb
$mkdir data
$cd data
$wget https://raw.githubusercontent.com/mapequation/infomap-notebooks/master/data/ninetriangles.net
$cd ..
$mkdir output
Fix the project ID in the batch job job.sh and send it to the queue. As in the previous exercise, copy and paste the url with the host name, port, and token to a browser on Kebnekaise. Then, open the notebook you downloaded and choose the kernel you just created:
Exercise 3: CPU and GPU code for Julia set
In this exercise, you will compute the Julia set in both CPU and GPU. The GPU part will be done by using the CuPy library. A nice feature in this example is that it shows you how you could use multi-GPUs by modifying the initial single GPU case. Here are the guidelines for running this notebook:
# Create a suitable folder in your project and move into it
$mkdir /proj/nobackup/hpc2n202Q-XYZ/juliaset-workspace
$cd /proj/nobackup/hpc2n202Q-XYZ/juliaset-workspace
# Purge and load JupyterLab module and dependencies
$module purge
$module load GCCcore/13.2.0 JupyterLab/4.2.0
# Create a isolated environment for this project called "infmpenv" and activate it
$python -m venv ./mandelenv
$source mandelenv/bin/activate
# Install ipykernel to be able to create your own kernel for this environment
$pip install --no-cache-dir --no-build-isolation ipykernel
# Install the kernel
$python -m ipykernel install --user --name=mandelenv
# Load a CUDA library
$ml CUDA/12.5.0
# Install Numpy, Matplotlib, and CuPy
$pip install --no-cache-dir --no-build-isolation numpy matplotlib cupy-cuda12x
After these installations, download the Jupyter Notebook for Juliaset as follows:
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/JUPYTERNOTEBOOKS/GPUS/Juliaset.ipynb
Fix the project ID in the batch job job.sh and send it to the queue. As in the previous exercise, copy and paste the url with the host name, port, and token to a browser on Kebnekaise. Choose the kernel mandelenv you recently created.
Exercise 4: Matlab in a Jupyter notebook
One can run a Jupyter notebook with a Matlab kernel and also take advantage of the Python environment to execute Python code, such as common AI libraries, in Matlab. You can follow these steps to get this combo working:
# Load Matlab
ml MATLAB/2023a.Update4
# Load a Python version compatible with Matlab and also CUDA (if you will run on GPUs)
ml GCCcore/11.3.0 Python/3.10.4 CUDA/11.7.0
# Create an environment called matlabenv (you can change this name)
python -m venv ./matlabenv
# Activate this environment
source matlabenv/bin/activate
# Perform installations: upgrade pip, and packages that you will need
pip install --upgrade pip
pip install -U scikit-learn
# Install Jupyterlab
pip install jupyterlab
# Install the Matlab proxy
pip install jupyter-matlab-proxy
Fix the project ID in the batch job job.sh
and send it to the queue.
As in previous exercises, copy and paste the url with the host name, port, and token to a browser on Kebnekaise. If you cloned this repository
you will have a copy of the matlab_kernel.ipynb
notebook under exercises/JUPYTERNOTEBOOKS/MATLAB
. Choose the MATLAB kernel
to execute this notebook:
When you try to run the notebook, Matlab will ask for a type of license. Because you are running this notebook on our HPC center, you can choose the option Existing License and then Start MATLAB.
In the same notebook at the bottom, we show you how to run a simple Python script digits.py in Matlab with the pyrunfile
command. This Python script uses an AI library.
AMBER¶
Amber (Assisted Model Building with Energy Refinement) is a suite of tools for running Molecular Dynamics and analyzing the dynamical trajectories.
How to find AMBER¶
AMBER is installed as a module on Kebnekaise. Notice that on the Intel nodes there are
more versions of this software installed than on the AMD nodes. Thus, if you are targeting one
version that is only installed on the Intel nodes, you will need to add the instruction
#SBATCH -C skylake
to your batch script, otherwise the job could arrive to an
AMD node that lacks that installation.
Exercises¶
Exercise 1: Running a MPI PMEMD job
The input files for the exercises are located in the folder exercises/AMBER. Thus, if you clone this repository you will find the files in this folder. Run the script job-mpi-pmemd.sh as it is and look at the performance of the simulation (average number of nanoseconds per day) which is written at the bottom of the output file 03_Prod.mdout.
Job submission command: sbatch job-mpi-pmemd.sh (fix your project ID)
Exercise 2: Optimal performance of a MPI PMEMD job
Running with more cores doesn’t always mean better performance. Run the script job-mpi-pmemd.sh with a different number of MPI tasks (-n) and obtain the value for the performance of AMBER (as a function of the number of cores). The performance of AMBER can be obtained from the average number of nanoseconds per day (ns/day) in the file 03_Prod.mdout.
A plot of the number of ns/day vs. number of cores can help you to visualize the results. Is it worth it to go from 14 cores to 28 cores? What about going from 28 cores to 42 cores? Or even from 42 cores to 56 cores?
Exercise 3: Optimal performance of a GPU PMEMD job
Run the script job-gpu-pmemd.sh with a different number of MPI tasks (-n) and obtain the value for the performance of AMBER (as a function of the number of cores). You are encourage to plot the average number of ns/day vs. number of cores as in the previous case. What is the optimal value for the number of MPI tasks?
Hint: Going above 4 MPI tasks will not give you better performance because in AMBER the number of MPI tasks are tightly bound to the number of GPU cards.
Exercise 4: Monitoring the performance of your jobs
Change the number of steps (nstlim) to 100000 in the file 03_Prod.in. Also, set the number of cores (-n) to 28 (1 node) and the time (-t) to 15 min in the file job-mpi-pmemd.sh. By submitting the job to the queue with sbatch job-mpi-pmemd.sh you get a number as output, this number is the job ID. On the command line, type job-usage job_ID. This will generate a URL that you can copy/paste to your local browser to monitor the efficiency of your simulation. How efficient is it in your case?
Hint: on the top right corner you can change the update frequency of the plots from 15m to 1m for instance. It takes a few minutes before you can see the results on the plots.
Gromacs¶
Gromacs (GROningen MAchine for Chemical Simulations) is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles
How to find Gromacs¶
Gromacs is installed as a module on Kebnekaise. Notice that on the Intel nodes there are
more versions of this software installed than on the AMD nodes. Thus, if you are targeting one
version that is only installed on the Intel nodes, you will need to add the instruction
#SBATCH -C skylake
to your batch script, otherwise the job could arrive to an
AMD node that lacks that installation.
We performed a benchmark of Gromacs on the different Nvidia GPUs that are available on Kebnekaise using the batch script job-gpu-gromacs.sh. The results can be seen in the following plot. The labels 1,2, and 3 refer to the three different and common options to run Gromacs written in this batch job. A dashed red line at 25 ns/day is added for better visualization.
Exercises¶
Exercise 1: Running a MPI job
The input files for this exercise are located in GROMACS/MPI.
Go to this folder and run the script job-mpi-gromacs.sh
by using different values of the
number of MPI tasks (-n). Submit this file to the batch queue (sbatch job-mpi-gromacs.sh). Use the
number you get from sbatch (this is called job ID) to get an URL on the command line by typing:
job-usage job_ID
.
Then, copy and paste that URL on your local browser. After ~1 min. you will start to see the usage of the resources. Tip: In the top-right corner change the updating default 15m to 30s.
In the plot for CPU usage, you can see how efficiently are the requested resources being used (in percentage). How efficient is your simulation?
Exercise 2: Running a GPU job
In the GROMACS/GPU folder, take a look at the script job-gpu-gromacs.sh. At the end of the script you will find three different ways to run Gromacs, the first one being the default one (no Offloading any task to GPUs), the second one the MPI version where nonbonded/PME interactions are offloaded to GPUs, and the third one being the Threaded-MPI version with nonbonded/PME interactions offloaded to GPUs. Submit the job to the queue and monitor the usage with the job-usage command that was introduced in the previous exercise.
When the script finishes, you should see a step-like plot (in the graphana interface for job-usage results) for the CPU/GPU usage where each step denotes each simulation. Based on these results, what is the best of the three options in the script (for the current nr. of cores and GPUs) for running Gromacs?
You can check if this analysis agrees with the performance for each run as reported in the log file measured in ns/day.
What is the percentage of the GPUs used in the simulation based on the results from job-usage?
How is the performance on GPUs version compared to that on CPU-only version in the previous examples?
More information on Gromacs performance can be found in the documentation for performance improvement of this software.
Keypoints
-
Kebnekaise is a highly heterogeneous system. Thus, you will need to consciously decide the hardware where your simulations will run.
-
Notice that Intel nodes have at the moment more versions installed of some software than the AMD nodes.
-
It is a good practice to monitor the usage of resources, we offer the command
job-usage job_ID
on Kebnekaise.