Introduction to OpenMP (part 1)
OpenMP (Open Multi-Processing) is a programming API for shared-memory parallel programming in C, C++, and Fortran languages. It is based on pragmas or directives which augment the source code and change how a compiler processes the source code. In case of OpenMP, the pragmas specify how the code is to be parallelized.
Important
The Kebnekaise login and compute nodes have the OMP_NUM_THREADS
environmental variable set to 1
by default.
If you are using the Kebnekaise login nodes to experiment with OpenMP, then it is important to set the
OMP_NUM_THREADS
environmental variable to some reasonable value:$ export OMP_NUM_THREADS=8
Please note that you are not allowed to run long computations on the login nodes!
If you are using the Kebnekaise compute nodes to experiment with OpenMP, then either unset the
OMP_NUM_THREADS
environmental variable:$ unset OMP_NUM_THREADS
or, if you specified the
--cpus-per-task=<cpu count>
SLURM argument, set theOMP_NUM_THREADS
environmental variable to the number of CPU cores available for the task:$ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Simple example
Consider the following “Hello world” program:
1#include <stdio.h>
2
3int main() {
4 printf("Hello world!\n");
5 return 0;
6}
We can confirm that the code indeed behaves the way we expect:
$ gcc -o my_program my_program.c -Wall
$ ./my_program
Hello world!
Let us modify the program by adding an OpenMP pragma:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 printf("Hello world!\n");
6 return 0;
7}
This time the program behaves very differently (note the extra -fopenmp
compiler option):
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Hello world!
...
Hello world!
Clearly, the omp parallel
pragma caused the program to execute the printf
line several times.
If you go and try to execute the program on a different computer, you will observe that the number of lines printed is the same as the number of processor cores in the computer.
The -fopenmp
compiler option tells the compiler to expect OpenMP pragmas.
OpenMP pragmas and constructs
In C and C++, an OpenMP pragma has the following form:
#pragma omp directive-name [clause[ [,] clause] ... ] new-line
A compiler typically supports several types of pragmas, not just OpenMP pragmas.
Therefore, all OpenMP pragmas begin with the keywords #pragma omp
.
The directive-name
placeholder specifies the used OpenMP construct (e.g. parallel
) and a pragma is always followed by a new line.
Typically, a pragma affects the user code that follows it but some OpenMP pragmas are stand-alone.
You can span a pragma across multiple lines by using a backslash () immediately followed by a new line:
#pragma omp directive-name \
[clause[ [,] \
clause] ... ] new-line
Parallel construct
In the earlier example, we used the parallel
pragma:
#pragma omp parallel [clause[ [,] clause] ... ] new-line
structured-block
The pragma creates a team of OpenMP threads that executes the structured-block
as a parallel region:
The structured-block
region can be a single statement, like in the earlier example, or a structured block consisting of several statements:
#pragma omp parallel ...
{
statement1;
statement2;
...
}
OpenMP guarantees that all threads in the team have executed the structured block before the execution continues outside the parallel region.
The behaviour of a parallel construct can be modified with several clauses:
if([parallel :] scalar-expression)
num_threads(integer-expression)
default(shared | none)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction([reduction-modifier ,] reduction-identifier : list)
proc_bind(master | close | spread)
allocate([allocator :] list)
We will return to some of these clauses later but for now it is sufficient to know that a parallel construct can be selectively enabled/disabled with the if
clause and the size of the team can be explicitly set with the num_threads
clause.
Data sharing rules
Since the structured block that follows a parallel construct is executed in parallel by a team of threads, we must make sure that the related data accesses do not cause any conflicts. For example, the behaviour of the following program is not well defined:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel
6 printf("I think the number is %d.\n", number++);
7 return 0;
8}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 2.
I think the number is 8.
....
I think the number is 1.
I think the number is 1.
....
I think the number is 2.
....
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 2.
...
I think the number is 1.
I think the number is 2.
...
We can make two observations:
The order in which the
printf
statements are executed is arbitrary. This can be a desired behaviour.Some numbers are printed multiple times. This is usually an undesired behaviour.
The explanation is that once the team is created, the threads execute the structured block independently of each other.
This explain why the numbers are printed in an arbitrary order.
The threads also read and write the variable number
independently of each other which explain why some threads do not see the changes the other threads have made:
OpenMP implements a set of rules that define how variables behave inside OpenMP constructs.
All variables are either private
or shared
:
- Private:
Each thread has its own copy of the variable.
- Shared:
All threads share the same variable.
These basic rules apply:
All variables declared outside parallel region are shared.
All variables declared inside a parallel region are private.
Loop counters are private (in parallel loops).
1int a = 5; // shared
2
3int main() {
4 int b = 44; // shared
5
6 #pragma omp parallel
7 {
8 int c = 3; // private
9 }
10}
In the above example, the variable number
is declared outside the parallel region and all threads therefore share the same variable.
We can use the private clause to turn a variable that has been declared outside a parallel region into a private variable:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel private(number)
6 printf("I think the number is %d.\n", number++);
7 return 0;
8}
However, the end result is, once again, unexpected:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 0.
I think the number is 0.
I think the number is 0.
...
This happens because each thread has its own number
variable that is separate from the number
variable declared outside the parallel region:
The private variables do not inherit the value of the original variable. If we want this to happen, then we must use the firstprivate clause:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel firstprivate(number)
6 printf("I think the number is %d.\n", number++);
7 return 0;
8}
This time, the end result is as expected:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 1.
...
That is, the private variables inherits the value of the original variable:
Explicit data sharing rules
The default behaviour can be changed with the default clause:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel default(none)
6 printf("I think the number is %d.\n", number++);
7 return 0;
8}
This tells the compiler that a programmer must explicitly set the data sharing rule for each variable.
It is therefore not surprising that the compiler produces an error indicating that the number
variable is not specified in the enclosing parallel region:
$ gcc -o my_program my_program.c -Wall -fopenmp
my_program.c: In function ‘main’:
my_program.c:6:5: error: ‘number’ not specified in enclosing ‘parallel’
6 | printf("I think the number is %d.\n", number++);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
my_program.c:5:13: error: enclosing ‘parallel’
5 | #pragma omp parallel default(none)
|
We can now set the number
variable to firstprivate:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel default(none) firstprivate(number)
6 printf("I think the number is %d.\n", number++);
7 return 0;
8}
It is generally recommended that a programmer sets the data sharing rules explicitly as this forces them to think about the data sharing rules. It is also advisable to declare all private variables inside the structured block.