Application examples

Best practices

Use your project directory instead of the home directory

The HOME directory has a limited storage space (~25 GB). Your project directory /proj/nobackup/hpc2n202X-XYZ has a much larger space.

Create a soft-link to your storage project

It will be very convinient to create a soft-link to your storage project in your home directory for a faster navigation:

     cd $HOME
     ln -s /proj/nobackup/hpc2n202X-XYZ choose-a-name
Monitoring the use of resources

Most likely you will allocate many cores and many GPUs for your simulations. You can monitor the use of these resources with the job-usage job_ID command, where job_ID is the output number of the sbatch command. You can also see this number if you type squeue -u my-username. job-usage outputs a url that you can copy/paste in your local browser where you can see how resources are being used:

job-usage

Matlab

How to find Matlab

Matlab is available through the Menu bar if you are using ThinLinc client (recommended). Additionally, you can load a Matlab module on a Linux terminal on Kebnekaise. Details for these two options can be found here.

First time configuration

The first time you access Matlab on Kebnekaise, you need to configure it by following these guidelines Configuring Matlab. After configuring the cluster, it is a good practice to validate the cluster (HOME -> Parallel -> Create and Manage Clusters):

clustervalidation

Notice that it is recommended to use a small number of workers for the validation, in this case 4.

Tools for efficient simulations

Chart flow for a more efficient Matlab code using existing tools adapted from Mathworks documentation on parallel computing:

pctworkflow

MATLAB on GPUs

Notice that MATLAB currently supports only NVIDIA GPUs (v100,a40,a6000,a100,l40s,h100), with v100 and l40s being the most abundant (10 nodes each).

Use MATLAB for lightweight tasks on the login nodes

Remember that login nodes are used by many users and if you run heavy jobs there, you will interfere with the workflow of them.

Exercises

Exercise 1: Matlab serial job

The folder SERIAL contains a function funct.m which performs a FFT on a matrix. The execution time is obtained with tic/toc and written down in the output file called log.out. Run the function by using the MATLAB GUI with the help of the script submit.m.

As an alternative, you can submit the job via a batch script job.sh. Here, you will need to fix the Project_ID with the one provided for the present course and the Matlab version.

Exercise 2: Matlab parallel job
  • PARFOR folder contains an example of a parallelized loop with the “parfor” directive. A pause() function is included in the loop to make it heavy. This function can be submitted to the queue by running the script submit.m in the MATLAB GUI. The number of workers can be set by replacing the string FIXME (in the “submit.m” file) with the number you desire. Try different values for the number of workers from 1 to 10 and take a note of the simulation time output at the end of the simulation. Where does the code achieve its peak performance?

  • SPMD folder presents an example of a parallelized code using SPMD paradigm. Submit this job to the queue through the MATLAB GUI. This example illustrates the use of parpool to run parallel code in a more interactive manner.

Exercise 3: Matlab GPU job

GPU folder contains a test case that computes a Mandelbrot set both on CPU mandelcpu.m and on GPU mandelgpu.m. You can submit the jobs through the MATLAB GUI using the submitcpu.m and submitgpu.m files.

The final output if everything ran well are two .png figures which display the timings for both architectures. Use the “eom” command on the terminal to visualize the images (eom out-X.png)

R

How to find R

Similar to Matlab, R is available through the Menu bar if you are using ThinLinc client (recommended). Additionally, you can load a Matlab module on a Linux terminal on Kebnekaise. Details for these two options can be found here.

First time configuration

The first time you access R on Kebnekaise, you need to configure it by following the Preparations step.

Recommendations

Be aware of data duplication in R

Some parallel functions mcapply in this example, tend to replicate the data for the workers (cores) if the dataframe is modified by them. This can be crucial if you are working with a large data frame and you are employing several parallel functions, for instance during the training of machine learning models because your simulation could easily exceed the available memory per node.

   library(parallel)
   library(pryr)

   prev <- mem_used()                                                         
   print(paste("Memory initially allocated by R:", prev/1e6, "MB"))

   # Define a relatively large dataframe
   data_df <- data.frame(
   ID = seq(1, 1e7),
   Value = runif(1e7)
   )

   # Create a function to be applied to each row (or a subset of rows)
   process_function <- function(i, df) {
   # do some modification the i-th row 
   return(df$Value[i] * 2)
   }
   prev <- mem_used() - prev
   print(paste("Memory after the serial code execution:", prev/1e6, "MB"))

   # Use mclapply to process the dataframe in parallel
   num_cores <- 4
   results <- mclapply(1:nrow(data_df), function(i) process_function(i, data_df), mc.cores = num_cores)
   prev <- mem_used() - prev
   print(paste("Memory after parallel code execution:", prev/1e6, "MB"))

In this example mem-dup.R, I used the function mem_used() provided by the pryr package to monitor the memory usage. The batch script for this example is job.sh.

One possible solution for data duplication could be to use use a data frame for each worker that includes only the relevant data for that particular computation.

Use R for lightweight tasks on the login nodes

Remember that login nodes are used by many users and if you run heavy jobs there, you will interfere with the workflow of them.

Exercises

Requirements

Prior to running the examples, you will need to install several packages. Follow these instructions:

  • The packages needed are:

    For this R version (check if they are not already installed)

    ml GCC/10.2.0 OpenMPI/4.0.5 R/4.0.4

    Rmpi

    doParallel

    caret

    MASS

    klaR

    nnet

    e1071

    rpart

    mlbench

    parallel

Exercise 1: R serial job

In the SERIAL folder, a serial is provided. Submit the script job.sh with the command R CMD and also with Rscript. Where could it be more suitable to use Rscript over R CMD?

Why do we need the flag #SBATCH -C ‘skylake’ in the batch script?

Exercise 2: Job Arrays

JOB-ARRAYS folder shows an example for job arrays, the batch file is job.sh. Submit the script and notice what is written in the output files.

Could you use job arrays in your simulations if you need to run many simulations where some parameters are changed? As an example, imagine that you need to run 28 simulations where a single parameter, such as the temperature, is changed from 2 to 56 C. Could you use the variable task_id in the previous script to get that range of temperatures so that each simulation prints out a different temperature?

Exercise 3: Parallel jobs with Rmpi

In the folder RMPI, you can find the R script Rmpi.R which uses 5 MPI slaves to apply the runif() function on an array “c”. The submit file is job_Rmpi.sh. As a result, you will see the random numbers generated by the slaves in the slurm output file

Exercise 4: Parallel jobs with doParallel

The folder DOPARALLEL contains two examples:

  1. doParallel.R shows how to use the foreach function in sequential mode (1 core) and the parallel mode using 4 cores. What is the difference in the usage of foreach for these two modes?

    Submit the job_doParallel.sh script and compare the timings of the sequential and parallel codes.

    How many workers are allocated for this simulation? If you want to allocate more or less, what changes must be made to these files?

  2. doParallel_ML.R presents the evaluation of several ML models in both sequential and parallel modes using the standard “iris” database. The difference is basically in the use of %dopar% instead of %do% function.

    Submit the batch script job_doParallel_ML.sh to the queue.

    In the output file observe the resulting elapsed times for the sequential and the 4 cores parallel simulation.

    Upon submitting the job to the queue you will get a number called job ID. Use the command:

    job-usage job_ID

    to obtain a URL which you can copy/paste in your local browser. Tip: refresh your browser several times to get the statistics.

    Can you see how the CPU is used? What about the memory?

    Note 1: In order to run this exercise, you need to have all the packages listed at the beginning of this document installed.

    Note 2: If you want to try a different number of cores for running the scripts, you should change that number in both the .R and .sh scripts

Exercise 5: Machine Learning jobs

In the folder ML we show a ML model using a sonar database and Random Forest as the training method (Rscript.R). The simulations are done both in serial and parallel modes. You may change the values for the number of cores (1 in the present case) to other values. Notice that the number of cores needs to be the same in the files job.sh and Rscript.R.

Try a different number of cores and monitor the timings which are reported at the end of the output file.

Alphafold

How to find Alphafold

Alphafold is installed as a module. Notice that on the Intel nodes there are more versions of Alphafold installed than on the AMD nodes. Thus, if you are targeting one version that is only installed on the Intel nodes, you will need to add the instruction #SBATCH -C skylake to your batch script, otherwise the job could arrive to an AMD node that lacks that installation.

Exercises

Exercise 1: Running a monomer protein simulation

In the exercises folder ALPHAFOLD you will find a fasta secuence for a monomer and the corresponding batch file job.sh for running the simulation on GPUs. Try running the simulation with CPUs only and then with l40s, v100 and a100 GPUs.

Notice that the simulation will take ~1hrs. so the purpose of this exercise is to know if the simulation starts running well only.

CryoSPARC

How to find CryoSPARC

The version 4.5.3 of CryoSPARC is installed as a module.

First time configuration

One needs a license for using this software. For academic purposes a free of charge license can be requested at the website cryosparc.com (one working day for the processing). Once you obtain your license ID copy it, create a file called /home/u/username/.cryosparc-license and paste it in the first line of this file. In the second line of the file write your email address.

Using CryoSPARC on Kebnekaise

Create a suitable folder in your project directory, for instance /proj/nobackup/hpc2n202X-XYZ/cryosparc and move into this folder. Download/copy the lane*tar files that are located here to the cryosparc folder and untar them here (tar -xvf lane_CPU.tar as an example).

Fix your Project_ID and time

Change the string Project_ID in the file lane*/cluster_script.sh to reflect your current project. Also, the time was set to 20 min. in these files but for your realistic simulations you can change it to longer times (-t 00:20:00).

The lanes should be recognized by CryoSPARC when it starts running.

Load the CryoSPARC modules. Start CryoSPARC and accept the request which asks about continuing using cryostart and that the folder was not used before. List the users on the server (which should be only yourself for this type of license), check the email address that is displayed for this user (it should be the one you added in the license file) and reset the password to. These steps are summarized here:

$cryosparc start 
...
Do you wish to continue starting cryosparc? [yN]: y
...
CryoSPARC master started. 
 From this machine, access CryoSPARC and CryoSPARC Live at
    http://localhost:39007
...

$cryosparc listusers
cryosparc resetpassword --email "myemail@mail.com" --password "choose-a-password"

Copy and paste the line which has the localhost port (notice that port number can change) to a browser on Kebnekaise:

cryosparc-ini

After loging in, you will be able to see the CryoSPARC’s dashboard:

cryosparc-dash

There are several tutorials at the CryoSPARC website, in the previous picture I followed the Introductory Tutorial (v4.0+).

Use cryosparc instead of cryosparcm

On Kebnekaise the command cryosparc should be used and not the one cited in the tutorial cryosparcm

Depending on the job type, CryoSPARC would suggest the hardware resources. For instance, in the tutorial above Step 4: Import Movies suggests using 1 CPU upon queueing it, but Step 5: Motion Correction suggests using 1 GPU. For CPU-only jobs you can choose the CPU lane, and if your job uses GPUs you can choose among L40s, V100, A100, and H100. Notice that the V100 and L40s are the most abundant at the moment:

cryosparc-dash

When you finish your analysis with CryoSPARC, shut it down with the command cryosparc stop on the terminal. Otherwise the server keeps running on the login node.

Additional information can be obtained from a tutorial given during a workshop on Berzelius and also from the NSC documentation. Notice that although the guidelines are for machines different to Kebnekaise, the systems are very similar and you could get ideas from them. For instance, the cryosparc copylanes is not supported on Kebnekaise and you will need to follow the step above (manually copying the lanes) for getting lanes working.

Nextflow

How to find Nextflow

Nextflow is installed as a module that can be loaded directly without any requirements. Notice that on the Intel nodes there are more versions of this software installed than on the AMD nodes. Thus, if you are targeting one version that is only installed on the Intel nodes, you will need to add the instruction #SBATCH -C skylake to your batch script, otherwise the job could arrive to an AMD node that lacks that installation.

Exercises

Exercise 1: Arabidopsis

The data for running this example can be found in this paper and more details about the analysis can be found there as well. We have downloaded the data for you and you can get it by copying the files to your working project:

$cd /proj/nobackup/your-project
$mkdir nextflow-arabidopsis 
$cd nextflow-arabidopsis
$cp /proj/nobackup/hpc2n/SR*gz 
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/ARABIDOPSIS/design_test.csv
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/ARABIDOPSIS/job.sh

Fix the Project_ID to match the current project you are part of and send the job to the queue. This example takes ~3 hrs. so the purpose of this exercise is just to show you how to run this job with Nextflow.

Exercise 2: Interactive job submission

Nextflow allows you to submit jobs interactively on the Kebnekaise’s command line. You need to write a file with the instructions to be executed by Nextflow, in the present case, it is a file wc.nf which unzips a file file.txt.gz and counts the number of lines in it. A configuration file for the cluster hpc2n.config is needed with some parameters that need to be changed with your personal information. Similarly to the previous exercise, you can follow these commands:

$cd /proj/nobackup/your-project
$mkdir nextflow-interactive 
$cd nextflow-interactive 
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/INTERACTIVE/wc.nf
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/INTERACTIVE/file.txt.gz
$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/NEXTFLOW/INTERACTIVE/hpc2n.config

load the Nextflow module and send the job interactively by typing the command on the Kebnekaise’s terminal (fix the project ID):

$ml Nextflow/24.04.2
$nextflow run wc.nf -c hpc2n.config --input file.txt.gz --project hpc2n202X-XYZ --clusterOptions "-t 00:05:00 -n 28 -N 1"

Here, you will run the job on 28 cores. On a different terminal tab you can check that the job is submitted/running with the command squeue -u your-username.

Apptainer

How to find Apptainer

Apptainer is site-installed meaning that you can run it without loading a module. Apptainer is supported on Kebnekaise instead of Singularity. The recipes that are built/run with Singularity can also be built/run with Apptainer with the same parameters. You will need to replace the command singularity by apptainer. If you are curious, you will notice that the command singularity is also available on Kebnekaise but it is just a soft-link to apptainer:

$which singularity 
/bin/singularity

$ls -lahrt /bin/singularity 
lrwxrwxrwx 1 root root 9 Mar 14 18:30 /bin/singularity -> apptainer

Use R for lightweight tasks on the login nodes

As with any other software, use Apptainer on the login node for simple tasks, for instance building a lightweight image, otherwise run a batch job.

Exercises

Exercise 1: Building and running an Apptainer image

This is an example for building a software called Gromacs. Build a Gromacs container as follows in the directory which contains the gromacs.def definition file:

$apptainer build gromacs.sif gromacs.def

Download the benchMEM.tpr file here and place it in the directory where the .sif is generated. In fact you can place the files at any other location but then you will need to modify the paths in the job.sh batch script.

Submit the job.sh file to the queue. The output of Gromacs including its performance at the bottom of it (line with the ns/day string) is written in the md.log files. As a comparison, after running the Apptainer image, the module of Gromacs is loaded and the same simulation is run.

TensorFlow

How to find TensorFlow

Several versions of TensorFlow are installed as modules on Kebnekaise. Similarly to other software, on Intel nodes there are more versions of this software installed than on the AMD nodes.

Exercises

Exercise 1: Running TensorFlow simulations

In this exercise, you will run a script with TensorFlow v. 2.15 on GPUs. Notice that because this version of TensorFlow is available on all the NVIDIA GPUs, you just need to write the type of GPUs you want to use, in the present case l40s. There are three different examples in the TENSORFLOW folder under the exercises one: hello_tensorflow.py (prints out Hello, TensorFlow! string), loss.py (it computes a loss in a model), and mnist_mlp.py (which runs a model using the MNIST database).

The batch script is job.sh. Submit the job with different types of GPUs.

Jupyter Notebooks

You can use Jupyter Notebooks on Kebnekaise through JupyterLab. Jupyter Notebooks allow you to work in a more interactive manner which is convenient when you are at the development phase of your project. There are available kernels for most popular languages: R, Python, Matlab, and Julia to work in a Jupyter Notebook.

How to find JupyterLab

Several versions of JupyterLab are installed as modules on Kebnekaise. Similarly to other software, on Intel nodes there are more versions of this software installed than on the AMD nodes.

Using Jupyter Notebooks on Kebnekaise

Guidelines for running Jupyter Notebooks on Kebnekaise can be found here.

Exercises

Exercise 1: Running a Jupyter Notebook

Because the tasks executed in a Jupyter Notebook are, in general, computationally expensive it is more convenient to run them on a compute node instead of the login nodes. To do this, you need to prepare a batch script like this one job.sh.

Once you submit your job and it starts running, check the output file slurm*out and search for the string http://b-cnwxyz.hpc2n.umu.se:8888/lab?token=xy…z. Copy this string and paste it in a browser on Kebnekaise. You will be directed to the dashboard of JupyterLab.

A couple of notes:

  • You can change the type of the GPU where you want to run the notebook

  • Cancel the job (scancel job_ID) if you stop using the notebook

Exercise 2: Running Infomap in a Jupyter Notebook

Infomap is a software for network community detection. It could be convenient for you to work in a Jupyter Notebook if the simulations are not long and you need to see the graphical results right away. Here, there are the steps you can follow to get Infomap running on a notebook:

# Create a suitable folder in your project and move into it
$mkdir /proj/nobackup/hpc2n202Q-XYZ/infomap-workspace
$cd /proj/nobackup/hpc2n202Q-XYZ/infomap-workspace
# Purge and load JupyterLab module and dependencies
$module purge
$module load GCCcore/13.2.0 JupyterLab/4.2.0
# Create a isolated environment for this project called "infmpenv" and activate it
$python -m venv ./infmpenv 
$source infmpenv/bin/activate
# Install ipykernel to be able to create your own kernel for this environment
$pip install --no-cache-dir --no-build-isolation ipykernel
# Install Infomap, Networkx, and Matplotlib
$pip install --no-cache-dir infomap networkx matplotlib
# Install the kernel
$python -m ipykernel install --user --name=infmpenv

After doing these installations, download the Jupyter Notebook for Infomap, create a data and output folders as follows:

$wget https://raw.githubusercontent.com/mapequation/infomap-notebooks/master/1_1_infomap_intro.ipynb
$mkdir data
$cd data 
$wget https://raw.githubusercontent.com/mapequation/infomap-notebooks/master/data/ninetriangles.net
$cd ..
$mkdir output

Fix the project ID in the batch job job.sh and send it to the queue. As in the previous exercise, copy and paste the url with the host name, port, and token to a browser on Kebnekaise. Then, open the notebook you downloaded and choose the kernel you just created:

infomap

Exercise 3: CPU and GPU code for Julia set

In this exercise, you will compute the Julia set in both CPU and GPU. The GPU part will be done by using the CuPy library. A nice feature in this example is that it shows you how you could use multi-GPUs by modifying the initial single GPU case. Here are the guidelines for running this notebook:

# Create a suitable folder in your project and move into it
$mkdir /proj/nobackup/hpc2n202Q-XYZ/juliaset-workspace
$cd /proj/nobackup/hpc2n202Q-XYZ/juliaset-workspace
# Purge and load JupyterLab module and dependencies
$module purge
$module load GCCcore/13.2.0 JupyterLab/4.2.0
# Create a isolated environment for this project called "infmpenv" and activate it
$python -m venv ./mandelenv 
$source mandelenv/bin/activate
# Install ipykernel to be able to create your own kernel for this environment
$pip install --no-cache-dir --no-build-isolation ipykernel
# Install the kernel
$python -m ipykernel install --user --name=mandelenv
# Load a CUDA library
$ml CUDA/12.5.0
# Install Numpy, Matplotlib, and CuPy
$pip install --no-cache-dir --no-build-isolation numpy matplotlib cupy-cuda12x

After these installations, download the Jupyter Notebook for Juliaset as follows:

$wget https://raw.githubusercontent.com/hpc2n/intro-course/master/exercises/JUPYTERNOTEBOOKS/GPUS/Juliaset.ipynb

Fix the project ID in the batch job job.sh and send it to the queue. As in the previous exercise, copy and paste the url with the host name, port, and token to a browser on Kebnekaise. Choose the kernel mandelenv you recently created.

Exercise 4: Matlab in a Jupyter notebook

One can run a Jupyter notebook with a Matlab kernel and also take advantage of the Python environment to execute Python code, such as common AI libraries, in Matlab. You can follow these steps to get this combo working:

# Load Matlab 
ml MATLAB/2023a.Update4
# Load a Python version compatible with Matlab and also CUDA (if you will run on GPUs)
ml GCCcore/11.3.0  Python/3.10.4 CUDA/11.7.0
# Create an environment called matlabenv (you can change this name)
python -m venv ./matlabenv
# Activate this environment
source matlabenv/bin/activate
# Perform installations: upgrade pip, and packages that you will need
pip install --upgrade pip
pip install -U scikit-learn
# Install Jupyterlab
pip install jupyterlab
# Install the Matlab proxy
pip install jupyter-matlab-proxy

Fix the project ID in the batch job job.sh and send it to the queue. As in previous exercises, copy and paste the url with the host name, port, and token to a browser on Kebnekaise. If you cloned this repository you will have a copy of the matlab_kernel.ipynb notebook under exercises/JUPYTERNOTEBOOKS/MATLAB. Choose the MATLAB kernel to execute this notebook:

infomap

When you try to run the notebook, Matlab will ask for a type of license. Because you are running this notebook on our HPC center, you can choose the option Existing License and then Start MATLAB.

In the same notebook at the bottom, we show you how to run a simple Python script digits.py in Matlab with the pyrunfile command. This Python script uses an AI library.

AMBER

Amber (Assisted Model Building with Energy Refinement) is a suite of tools for running Molecular Dynamics and analyzing the dynamical trajectories.

How to find AMBER

AMBER is installed as a module on Kebnekaise. Notice that on the Intel nodes there are more versions of this software installed than on the AMD nodes. Thus, if you are targeting one version that is only installed on the Intel nodes, you will need to add the instruction #SBATCH -C skylake to your batch script, otherwise the job could arrive to an AMD node that lacks that installation.

Exercises

Exercise 1: Running a MPI PMEMD job

The input files for the exercises are located in the folder exercises/AMBER. Thus, if you clone this repository you will find the files in this folder. Run the script job-mpi-pmemd.sh as it is and look at the performance of the simulation (average number of nanoseconds per day) which is written at the bottom of the output file 03_Prod.mdout.

Job submission command: sbatch job-mpi-pmemd.sh (fix your project ID)

Exercise 2: Optimal performance of a MPI PMEMD job

Running with more cores doesn’t always mean better performance. Run the script job-mpi-pmemd.sh with a different number of MPI tasks (-n) and obtain the value for the performance of AMBER (as a function of the number of cores). The performance of AMBER can be obtained from the average number of nanoseconds per day (ns/day) in the file 03_Prod.mdout.

A plot of the number of ns/day vs. number of cores can help you to visualize the results. Is it worth it to go from 14 cores to 28 cores? What about going from 28 cores to 42 cores? Or even from 42 cores to 56 cores?

Exercise 3: Optimal performance of a GPU PMEMD job

Run the script job-gpu-pmemd.sh with a different number of MPI tasks (-n) and obtain the value for the performance of AMBER (as a function of the number of cores). You are encourage to plot the average number of ns/day vs. number of cores as in the previous case. What is the optimal value for the number of MPI tasks?

Hint: Going above 4 MPI tasks will not give you better performance because in AMBER the number of MPI tasks are tightly bound to the number of GPU cards.

Exercise 4: Monitoring the performance of your jobs

Change the number of steps (nstlim) to 100000 in the file 03_Prod.in. Also, set the number of cores (-n) to 28 (1 node) and the time (-t) to 15 min in the file job-mpi-pmemd.sh. By submitting the job to the queue with sbatch job-mpi-pmemd.sh you get a number as output, this number is the job ID. On the command line, type job-usage job_ID. This will generate a URL that you can copy/paste to your local browser to monitor the efficiency of your simulation. How efficient is it in your case?

Hint: on the top right corner you can change the update frequency of the plots from 15m to 1m for instance. It takes a few minutes before you can see the results on the plots.

Gromacs

Gromacs (GROningen MAchine for Chemical Simulations) is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles

How to find Gromacs

Gromacs is installed as a module on Kebnekaise. Notice that on the Intel nodes there are more versions of this software installed than on the AMD nodes. Thus, if you are targeting one version that is only installed on the Intel nodes, you will need to add the instruction #SBATCH -C skylake to your batch script, otherwise the job could arrive to an AMD node that lacks that installation.

We performed a benchmark of Gromacs on the different Nvidia GPUs that are available on Kebnekaise using the batch script job-gpu-gromacs.sh. The results can be seen in the following plot. The labels 1,2, and 3 refer to the three different and common options to run Gromacs written in this batch job. A dashed red line at 25 ns/day is added for better visualization.

gromacs-benchmark

Exercises

Exercise 1: Running a MPI job

The input files for this exercise are located in GROMACS/MPI. Go to this folder and run the script job-mpi-gromacs.sh by using different values of the number of MPI tasks (-n). Submit this file to the batch queue (sbatch job-mpi-gromacs.sh). Use the number you get from sbatch (this is called job ID) to get an URL on the command line by typing:
job-usage job_ID.

Then, copy and paste that URL on your local browser. After ~1 min. you will start to see the usage of the resources. Tip: In the top-right corner change the updating default 15m to 30s.

In the plot for CPU usage, you can see how efficiently are the requested resources being used (in percentage). How efficient is your simulation?

Exercise 2: Running a GPU job

In the GROMACS/GPU folder, take a look at the script job-gpu-gromacs.sh. At the end of the script you will find three different ways to run Gromacs, the first one being the default one (no Offloading any task to GPUs), the second one the MPI version where nonbonded/PME interactions are offloaded to GPUs, and the third one being the Threaded-MPI version with nonbonded/PME interactions offloaded to GPUs. Submit the job to the queue and monitor the usage with the job-usage command that was introduced in the previous exercise.

When the script finishes, you should see a step-like plot (in the graphana interface for job-usage results) for the CPU/GPU usage where each step denotes each simulation. Based on these results, what is the best of the three options in the script (for the current nr. of cores and GPUs) for running Gromacs?

You can check if this analysis agrees with the performance for each run as reported in the log file measured in ns/day.

What is the percentage of the GPUs used in the simulation based on the results from job-usage?

How is the performance on GPUs version compared to that on CPU-only version in the previous examples?

More information on Gromacs performance can be found in the documentation for performance improvement of this software.

Keypoints

  • Kebnekaise is a highly heterogeneous system. Thus, you will need to consciously decide the hardware where your simulations will run.

  • Notice that Intel nodes have at the moment more versions installed of some software than the AMD nodes.

  • It is a good practice to monitor the usage of resources, we offer the command job-usage job_ID on Kebnekaise.