Introduction to Kebnekaise
Objectives
Learn how to load the necessary modules on Kebnekaise.
Learn how to compile C code on Kebnekaise.
Learn how to place jobs to the batch queue.
Learn how to use the course project and reservations.
Modules and toolchains
You need to load the correct toolchain before compiling your code on Kebnekaise.
The available modules are listed using the ml avail command:
$ ml avail
------------------------- /hpc2n/eb/modules/all/Core --------------------------
Bison/3.0.5 fosscuda/2020a
Bison/3.3.2 fosscuda/2020b (D)
Bison/3.5.3 gaussian/16.C.01-AVX2
Bison/3.7.1 (D) gcccuda/2019b
CUDA/8.0.61 gcccuda/2020a
CUDA/10.1.243 (D) gcccuda/2020b (D)
...
The list shows the modules you can load directly, and so may change if you have loaded modules.
In order to see all the modules, including those that have prerequisites to load, use the command ml spider. Many types of application software fall in this category.
You can find more information regarding a particular module using the ml spider <module> command:
$ ml spider MATLAB
---------------------------------------------------------------------------
MATLAB: MATLAB/2019b.Update2
---------------------------------------------------------------------------
Description:
MATLAB is a high-level language and interactive environment that
enables you to perform computationally intensive tasks faster than
with traditional programming languages such as C, C++, and Fortran.
This module can be loaded directly: module load MATLAB/2019b.Update2
Help:
Description
===========
MATLAB is a high-level language and interactive environment
that enables you to perform computationally intensive tasks faster than with
traditional programming languages such as C, C++, and Fortran.
More information
================
- Homepage: http://www.mathworks.com/products/matlab
You can load the module using the ml <module> command:
$ ml MATLAB/2019b.Update2
You can list loaded modules using the ml command:
$ ml
Currently Loaded Modules:
1) snicenvironment (S) 7) libevent/2.1.11 13) PMIx/3.0.2
2) systemdefault (S) 8) numactl/2.0.12 14) impi/2018.4.274
3) GCCcore/8.2.0 9) XZ/5.2.4 15) imkl/2019.1.144
4) zlib/1.2.11 10) libxml2/2.9.8 16) intel/2019a
5) binutils/2.31.1 11) libpciaccess/0.14 17) MATLAB/2019b.Update2
6) iccifort/2019.1.144 12) hwloc/1.11.11
Where:
S: Module is Sticky, requires --force to unload or purge
You can unload all modules using the ml purge command:
$ ml purge
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) systemdefault 2) snicenvironment
Note that the ml purge command will warn that two modules were not unloaded.
This is normal and you should NOT force unload them.
Exercise
Load the FOSS toolchain for source code compilation:
$ ml purge
The
fossmodule loads the GNU compilerInvestigate which modules were loaded.
Purge all modules.
Find the latest FOSS toolchain (
foss). Investigate the loaded modules. Purge all modules.
Compile C code
Once the correct toolchain (foss) has been loaded, we can compile C source files (*.c) with the GNU compiler:
$ gcc -o <binary name> <sources> -Wall
The -Wall causes the compiler to print additional warnings.
Exercise
Compile the following “Hello world” program:
1#include <stdio.h>
2
3int main() {
4 printf("Hello world!\n");
5 return 0;
6}
Course project
You can request to be a member of the course project hpc2n202w-xyz, where the letters need to be substituted by the actual numerical values for the project.
Submitting jobs
The jobs are submitted using the srun command:
$ srun --account=<account> --ntasks=<task count> --time=<time> <command>
This places the command into the batch queue. The three arguments are the project number, the number of tasks, and the requested time allocation. For example, the following command prints the uptime of the allocated compute node:
$ srun --account=hpc2n202w-xyz --ntasks=1 --time=00:00:15 uptime
srun: job 12727702 queued and waiting for resources
srun: job 12727702 has been allocated resources
11:53:43 up 5 days, 1:23, 0 users, load average: 23,11, 23,20, 23,27
Note that we are using the course project, the number of tasks is set to one, and we are requesting 15 seconds.
We could submit multiple tasks using the --ntasks=<task count> argument:
$ srun --account=hpc2n202w-xyz --ntasks=4 --time=00:00:15 uname -n
b-cn0932.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
Note that all task are running on the same node.
We could request multiple CPU cores for each task using the --cpus-per-task=<cpu count> argument:
$ srun --account=hpc2n202w-xyz --ntasks=4 --cpus-per-task=14 --time=00:00:15 uname -n
b-cn0935.hpc2n.umu.se
b-cn0935.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
If you want to measure the performance, it is advisable to request an exclusive access to the compute nodes (--exclusive):
$ srun --account=hpc2n202w-xyz --ntasks=4 --cpus-per-task=14 --exclusive --time=00:00:15 uname -n
b-cn0935.hpc2n.umu.se
b-cn0935.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
b-cn0932.hpc2n.umu.se
Exercise
Run both “Hello world” programs on the the compute nodes.
Aliases
In order to save time, you can create an alias for a command:
$ alias <alist>="<command>"
For example:
$ alias run_full="srun --account=hpc2n202w-xyz --ntasks=1 --cpus-per-task=28 --time=00:05:00"
$ run_full uname -n
b-cn0932.hpc2n.umu.se
Batch files
It is often more convenient to write the commands into a batch file.
For example, we could write the following to a file called batch.sh:
1#!/bin/bash
2#SBATCH --account=hpc2n202w-xyz
3#SBATCH --ntasks=1
4#SBATCH --time=00:00:15
5
6ml purge
7ml foss/2020b
8
9uname -n
Note that the same arguments that were earlier passed to the srun command are now given as comments.
It is highly advisable to purge all loaded modules and re-load the required modules as the job inherits the environment.
The batch file is submitted using the sbatch <batch file> command:
sbatch batch.sh
Submitted batch job 12728675
By default, the output is directed to the file slurm-<job_id>.out, where <job_id> is the job id returned by the sbatch command:
$ cat slurm-12728675.out
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) systemdefault 2) snicenvironment
b-cn0102.hpc2n.umu.se
Exercise
Write two batch files that run both “Hello world” programs on the the compute nodes.
Job queue
You can investigate the job queue with the squeue command:
$ squeue -u $USER
If you want an estimate for when the job will start running, you can give the squeue command the argument --start.
You can cancel a job with the scancel command:
$ scancel <job_id>
What is High Performance Computing?
High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. (insideHPC.com)
- What does this mean?
- Aggregating computing power
Kebnekaise: 602 nodes in 15 racks totalling 19288 cores
Your laptop: 4 cores
- Higher performance
Kebnekaise: 728,000 billion arithmetical operations per second
Your laptop: 200 billion arithmetical operations per second
- Solve large problems
Time: The time required to form a solution to the problem is very long.
Memory: The solution of the problem requires a lot of memory and/or storage.
Memory models
When it comes to the memory layout, (super)computers can be divided into two primary categories:
- Shared memory:
A single memory space for all data:
Everyone can access the same data.
Straightforward to use.
- Distributed memory:
Multiple distinct memory spaces for the data:
Everyone has direct access only to the local data.
Requires communication and data transfers.
Computing clusters and supercomputers are generally distributed memory machines:
Programming models
The programming model changes when we aim for extra performance and/or memory:
- Single-core:
Matlab, Python, C, Fortran, …
Single stream of operations (thread).
Single pool of data.
- Multi-core:
Vectorized Matlab, pthreads, OpenMP
Multiple streams of operations (multiple threads).
Single pool of data.
Extra challenges:
Work distribution.
Coordination (synchronization, etc).
- Distributed memory:
MPI, …
Multiple streams of operations (multiple threads).
Multiple pools of data.
Extra challenges:
Work distribution.
Coordination (synchronization, etc).
Data distribution.
Communication and data transfers.
- Accelerators / GPUs:
CUDA, OpenCL, OpenACC, OpenMP, …
Single/multiple streams of operations on the host device.
Many lightweight streams of operations on the accelerator.
Multiple pools of data on multiple layers.
Extra challenges:
Work distribution.
Coordination (synchronization, etc).
Data distribution across multiple memory spaces.
Communication and data transfers.
- Hybrid:
MPI + OpenMP, OpenMP + CUDA, MPI + CUDA, …
Combines the benefits and the downsides of several programming models.
- Task-based:
OpenMP tasks
Does task-based programming count as a separate programming model?
Functions and data dependencies
Imagine the following computer program:
1#include <stdio.h>
2
3void function1(int a, int b) {
4 printf("The sum is %d.\n", a + b);
5}
6
7void function2(int b) {
8 printf("The sum is %d.\n", 10 + b);
9}
10
11int main() {
12 int a = 10, b = 7;
13 function1(a, b);
14 function2(b);
15 return 0;
16}
The program consists of two functions, function1 and function2, that are called one after another from the main function.
The first function reads the variables a and b, and the second function reads the variable b:
The program prints the line The sum is 17. twice.
The key observation is that the two functions calls are independent of each other.
More importantly, the two functions can be executed in parallel:
Let us modify the the program slightly:
1#include <stdio.h>
2
3void function1(int a, int *b) {
4 printf("The sum is %d.\n", a + *b);
5 *b += 3;
6}
7
8void function2(int b) {
9 printf("The sum is %d.\n", 10 + b);
10}
11
12int main() {
13 int a = 10, b = 7;
14 function1(a, &b);
15 function2(b);
16 return 0;
17}
This time the function function1 modifies the variable b:
Therefore, the two function calls are not independent of each other and changing the order would change the printed lines. Furthermore, executing the two functions in parallel would lead to an undefined result as the execution order would be arbitrary.
We could say that in this particular context, the function function2 is dependent on the function function1.
That is, the function function1 must be executed completely before the function function2 can be executed:
However, this data dependency exists only when these two functions are called in this particular sequence using these particular arguments. In a different context, this particular data dependency does not exists. We can therefore conclude that the data dependencies are separate from the functions definitions.