Introduction to OpenMP (part 1)

Objectives

  • Learn about the parallel construct.

  • Learn about data sharing rules.

OpenMP (Open Multi-Processing) is a programming API for shared-memory parallel programming in C, C++, and Fortran languages. It is based on pragmas or directives which augment the source code and change how a compiler processes the source code. In case of OpenMP, the pragmas specify how the code is to be parallelized.

Important

The Kebnekaise login and compute nodes have the OMP_NUM_THREADS environmental variable set to 1 by default.

  1. If you are using the Kebnekaise login nodes to experiment with OpenMP, then it is important to set the OMP_NUM_THREADS environmental variable to some reasonable value:

    $ export OMP_NUM_THREADS=8
    

    Please note that you are not allowed to run long computations on the login nodes!

  2. If you are using the Kebnekaise compute nodes to experiment with OpenMP, then either unset the OMP_NUM_THREADS environmental variable:

    $ unset OMP_NUM_THREADS
    

    or, if you specified the --cpus-per-task=<cpu count> SLURM argument, set the OMP_NUM_THREADS environmental variable to the number of CPU cores available for the task:

    $ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    

Simple example

Consider the following “Hello world” program:

1
2
3
4
5
6
#include <stdio.h>

int main() {
    printf("Hello world!\n");
    return 0;
}

We can confirm that the code indeed behaves the way we expect:

$ gcc -o my_program my_program.c -Wall
$ ./my_program
Hello world!

Let us modify the program by adding an OpenMP pragma:

1
2
3
4
5
6
7
#include <stdio.h>

int main() {
    #pragma omp parallel
    printf("Hello world!\n");
    return 0;
}

This time the program behaves very differently (note the extra -fopenmp compiler option):

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Hello world!
...
Hello world!

Clearly, the omp parallel pragma caused the program to execute the printf line several times. If you go and try to execute the program on a different computer, you will observe that the number of lines printed is the same as the number of processor cores in the computer. The -fopenmp compiler option tells the compiler to expect OpenMP pragmas.

Challenge

  1. Compile the “Hello world” program yourself and try it out.

  2. See what happens if you set the OMP_NUM_THREADS environmental variable to different values:

$ OMP_NUM_THREADS=<value> ./my_program

What happens? Can you guess why?

OpenMP pragmas and constructs

In C and C++, an OpenMP pragma has the following form:

#pragma omp directive-name [clause[ [,] clause] ... ] new-line

A compiler typically supports several types of pragmas, not just OpenMP pragmas. Therefore, all OpenMP pragmas begin with the keywords #pragma omp. The directive-name placeholder specifies the used OpenMP construct (e.g. parallel) and a pragma is always followed by a new line. Typically, a pragma affects the user code that follows it but some OpenMP pragmas are stand-alone. You can span a pragma across multiple lines by using a backslash (\ ) immediately followed by a new line:

#pragma omp directive-name \
    [clause[ [,] \
    clause] ... ] new-line

Parallel construct

In the earlier example, we used the parallel pragma:

#pragma omp parallel [clause[ [,] clause] ... ] new-line
    structured-block

The pragma creates a team of OpenMP threads that executes the structured-block as a parallel region:

../_images/parallel_construct.png

The structured-block region can be a single statement, like in the earlier example, or a structured block consisting of several statements:

#pragma omp parallel ...
{
    statement1;
    statement2;
    ...
}

OpenMP guarantees that all threads in the team have executed the structured block before the execution continues outside the parallel region.

The behaviour of a parallel construct can be modified with several clauses:

if([parallel :] scalar-expression)
num_threads(integer-expression)
default(shared | none)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction([reduction-modifier ,] reduction-identifier : list)
proc_bind(master | close | spread)
allocate([allocator :] list)

We will return to some of these clauses later but for now it is sufficient to know that a parallel construct can be selectively enabled/disabled with the if clause and the size of the team can be explicitly set with the num_threads clause.

Challenge

Modify the following program such that the printf line is executed only twice:

1
2
3
4
5
6
7
#include <stdio.h>

int main() {
    #pragma omp parallel
    printf("Hello world!\n");
    return 0;
}

Hint: Each thread in the team executes the structured block once.

Data sharing rules

Since the structured block that follows a parallel construct is executed in parallel by a team of threads, we must make sure that the related data accesses do not cause any conflicts. For example, the behaviour of the following program is not well defined:

1
2
3
4
5
6
7
8
#include <stdio.h>

int main() {
    int number = 1;
    #pragma omp parallel
    printf("I think the number is %d.\n", number++);
    return 0;
}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 2.
I think the number is 8.
....
I think the number is 1.
I think the number is 1.
....
I think the number is 2.
....
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 2.
...
I think the number is 1.
I think the number is 2.
...

We can make two observations:

  1. The order in which the printf statements are executed is arbitrary. This can be a desired behaviour.

  2. Some numbers are printed multiple times. This is usually an undesired behaviour.

The explanation is that once the team is created, the threads execute the structured block independently of each other. This explain why the numbers are printed in an arbitrary order. The threads also read and write the variable number independently of each other which explain why some threads do not see the changes the other threads have made:

../_images/conflict.png

OpenMP implements a set of rules that define how variables behave inside OpenMP constructs. All variables are either private or shared:

Private

Each thread has its own copy of the variable.

Shared

All threads share the same variable.

These basic rules apply:

  1. All variables declared outside parallel region are shared.

  2. All variables declared inside a parallel region are private.

  3. Loop counters are private (in parallel loops).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
int a = 5;                  // shared

int main() {
    int b = 44;             // shared

    #pragma omp parallel
    {
        int c = 3;          // private
    }
}

In the above example, the variable number is declared outside the parallel region and all threads therefore share the same variable.

Challenge

Modify the following program such that the variable number is declared inside the structured block and is therefore private:

#include <stdio.h>

int main() {
    int number = 1;
    #pragma omp parallel
    printf("I think the number is %d.\n", number++);
    return 0;
}

Run the program. Can you explain the behaviour?

Hint: Remember that a structured block that consists of several statements must be enclosed inside { } brackets.

We can use the private clause to turn a variable that has been declared outside a parallel region into a private variable:

1
2
3
4
5
6
7
8
#include <stdio.h>

int main() {
    int number = 1;
    #pragma omp parallel private(number)
    printf("I think the number is %d.\n", number++);
    return 0;
}

However, the end result is, once again, unexpected:

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 0.
I think the number is 0.
I think the number is 0.
...

This happens because each thread has its own number variable that is separate from the number variable declared outside the parallel region:

../_images/private.png

The private variables do not inherit the value of the original variable. If we want this to happen, then we must use the firstprivate clause:

1
2
3
4
5
6
7
8
#include <stdio.h>

int main() {
    int number = 1;
    #pragma omp parallel firstprivate(number)
    printf("I think the number is %d.\n", number++);
    return 0;
}

This time, the end result is as expected:

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 1.
...

That is, the private variables inherits the value of the original variable:

../_images/firstprivate.png

Explicit data sharing rules

The default behaviour can be changed with the default clause:

1
2
3
4
5
6
7
8
#include <stdio.h>

int main() {
    int number = 1;
    #pragma omp parallel default(none)
    printf("I think the number is %d.\n", number++);
    return 0;
}

This tells the compiler that a programmer must explicitly set the data sharing rule for each variable. It is therefore not surprising that the compiler produces an error indicating that the number variable is not specified in the enclosing parallel region:

$ gcc -o my_program my_program.c -Wall -fopenmp
my_program.c: In function ‘main’:
my_program.c:6:5: error: ‘number’ not specified in enclosing ‘parallel’
    6 |     printf("I think the number is %d.\n", number++);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
my_program.c:5:13: error: enclosing ‘parallel’
    5 |     #pragma omp parallel default(none)
      |

We can now set the number variable to firstprivate:

1
2
3
4
5
6
7
8
#include <stdio.h>

int main() {
    int number = 1;
    #pragma omp parallel default(none) firstprivate(number)
    printf("I think the number is %d.\n", number++);
    return 0;
}

It is generally recommended that a programmer sets the data sharing rules explicitly as this forces them to think about the data sharing rules. It is also advisable to declare all private variables inside the structured block.

Challenge

Fix the following program:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <stdio.h>

char *str = "I think the number is %d.\n";

int main() {
    int initial_number = 1;

    #pragma omp parallel
    int number = initial_number;
    printf(str, number++);

    return 0;
}

Use explicit data sharing rules.