Introduction to OpenMP (part 1)¶
Objectives
Learn about the
parallel
construct.Learn about data sharing rules.
OpenMP (Open Multi-Processing) is a programming API for shared-memory parallel programming in C, C++, and Fortran languages. It is based on pragmas or directives which augment the source code and change how a compiler processes the source code. In case of OpenMP, the pragmas specify how the code is to be parallelized.
Important
The Kebnekaise login and compute nodes have the OMP_NUM_THREADS
environmental variable set to 1
by default.
If you are using the Kebnekaise login nodes to experiment with OpenMP, then it is important to set the
OMP_NUM_THREADS
environmental variable to some reasonable value:$ export OMP_NUM_THREADS=8
Please note that you are not allowed to run long computations on the login nodes!
If you are using the Kebnekaise compute nodes to experiment with OpenMP, then either unset the
OMP_NUM_THREADS
environmental variable:$ unset OMP_NUM_THREADS
or, if you specified the
--cpus-per-task=<cpu count>
SLURM argument, set theOMP_NUM_THREADS
environmental variable to the number of CPU cores available for the task:$ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Simple example¶
Consider the following “Hello world” program:
1 2 3 4 5 6 | #include <stdio.h>
int main() {
printf("Hello world!\n");
return 0;
}
|
We can confirm that the code indeed behaves the way we expect:
$ gcc -o my_program my_program.c -Wall
$ ./my_program
Hello world!
Let us modify the program by adding an OpenMP pragma:
1 2 3 4 5 6 7 | #include <stdio.h>
int main() {
#pragma omp parallel
printf("Hello world!\n");
return 0;
}
|
This time the program behaves very differently (note the extra -fopenmp
compiler option):
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Hello world!
...
Hello world!
Clearly, the omp parallel
pragma caused the program to execute the printf
line several times.
If you go and try to execute the program on a different computer, you will observe that the number of lines printed is the same as the number of processor cores in the computer.
The -fopenmp
compiler option tells the compiler to expect OpenMP pragmas.
Challenge
Compile the “Hello world” program yourself and try it out.
See what happens if you set the
OMP_NUM_THREADS
environmental variable to different values:
$ OMP_NUM_THREADS=<value> ./my_program
What happens? Can you guess why?
Solution
Let us try values 1, 4 and 8:
$ OMP_NUM_THREADS=1 ./my_program
Hello world!
$ OMP_NUM_THREADS=4 ./my_program
Hello world!
Hello world!
Hello world!
Hello world!
$ OMP_NUM_THREADS=8 ./my_program
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
The “Hello world!” line is printed 1, 4 and 8 times.
The OMP_NUM_THREADS
environmental variable sets the default team size (see below).
OpenMP pragmas and constructs¶
In C and C++, an OpenMP pragma has the following form:
#pragma omp directive-name [clause[ [,] clause] ... ] new-line
A compiler typically supports several types of pragmas, not just OpenMP pragmas.
Therefore, all OpenMP pragmas begin with the keywords #pragma omp
.
The directive-name
placeholder specifies the used OpenMP construct (e.g. parallel
) and a pragma is always followed by a new line.
Typically, a pragma affects the user code that follows it but some OpenMP pragmas are stand-alone.
You can span a pragma across multiple lines by using a backslash (\
) immediately followed by a new line:
#pragma omp directive-name \
[clause[ [,] \
clause] ... ] new-line
Parallel construct¶
In the earlier example, we used the parallel
pragma:
#pragma omp parallel [clause[ [,] clause] ... ] new-line
structured-block
The pragma creates a team of OpenMP threads that executes the structured-block
as a parallel region:
The structured-block
region can be a single statement, like in the earlier example, or a structured block consisting of several statements:
#pragma omp parallel ...
{
statement1;
statement2;
...
}
OpenMP guarantees that all threads in the team have executed the structured block before the execution continues outside the parallel region.
The behaviour of a parallel construct can be modified with several clauses:
if([parallel :] scalar-expression)
num_threads(integer-expression)
default(shared | none)
private(list)
firstprivate(list)
shared(list)
copyin(list)
reduction([reduction-modifier ,] reduction-identifier : list)
proc_bind(master | close | spread)
allocate([allocator :] list)
We will return to some of these clauses later but for now it is sufficient to know that a parallel construct can be selectively enabled/disabled with the if
clause and the size of the team can be explicitly set with the num_threads
clause.
Challenge
Modify the following program such that the printf
line is executed only twice:
1 2 3 4 5 6 7 | #include <stdio.h>
int main() {
#pragma omp parallel
printf("Hello world!\n");
return 0;
}
|
Hint: Each thread in the team executes the structured block once.
Solution
Use the num_threads
clause to set the team size to two:
1 2 3 4 5 6 7 | #include <stdio.h>
int main() {
#pragma omp parallel num_threads(2)
printf("Hello world!\n");
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Hello world!
Data sharing rules¶
Since the structured block that follows a parallel construct is executed in parallel by a team of threads, we must make sure that the related data accesses do not cause any conflicts. For example, the behaviour of the following program is not well defined:
1 2 3 4 5 6 7 8 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel
printf("I think the number is %d.\n", number++);
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 2.
I think the number is 8.
....
I think the number is 1.
I think the number is 1.
....
I think the number is 2.
....
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 2.
...
I think the number is 1.
I think the number is 2.
...
We can make two observations:
The order in which the
printf
statements are executed is arbitrary. This can be a desired behaviour.Some numbers are printed multiple times. This is usually an undesired behaviour.
The explanation is that once the team is created, the threads execute the structured block independently of each other.
This explain why the numbers are printed in an arbitrary order.
The threads also read and write the variable number
independently of each other which explain why some threads do not see the changes the other threads have made:
OpenMP implements a set of rules that define how variables behave inside OpenMP constructs.
All variables are either private
or shared
:
- Private
Each thread has its own copy of the variable.
- Shared
All threads share the same variable.
These basic rules apply:
All variables declared outside parallel region are shared.
All variables declared inside a parallel region are private.
Loop counters are private (in parallel loops).
1 2 3 4 5 6 7 8 9 10 | int a = 5; // shared
int main() {
int b = 44; // shared
#pragma omp parallel
{
int c = 3; // private
}
}
|
In the above example, the variable number
is declared outside the parallel region and all threads therefore share the same variable.
Challenge
Modify the following program such that the variable number
is declared inside the structured block and is therefore private:
#include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel
printf("I think the number is %d.\n", number++);
return 0;
}
Run the program. Can you explain the behaviour?
Hint: Remember that a structured block that consists of several statements must be enclosed inside { }
brackets.
Solution
1 2 3 4 5 6 7 8 9 10 | #include <stdio.h>
int main() {
#pragma omp parallel
{
int number = 1;
printf("I think the number is %d.\n", number++);
}
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
...
I think the number is 1.
Note that all treads print 1.
This happens because each thread has its own number
variable that is initialized to 1.
The incrementation affects only the thread’s own copy of the variable.
We can use the private clause to turn a variable that has been declared outside a parallel region into a private variable:
1 2 3 4 5 6 7 8 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel private(number)
printf("I think the number is %d.\n", number++);
return 0;
}
|
However, the end result is, once again, unexpected:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 0.
I think the number is 0.
I think the number is 0.
...
This happens because each thread has its own number
variable that is separate from the number
variable declared outside the parallel region:
The private variables do not inherit the value of the original variable. If we want this to happen, then we must use the firstprivate clause:
1 2 3 4 5 6 7 8 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel firstprivate(number)
printf("I think the number is %d.\n", number++);
return 0;
}
|
This time, the end result is as expected:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 1.
...
That is, the private variables inherits the value of the original variable:
Explicit data sharing rules¶
The default behaviour can be changed with the default clause:
1 2 3 4 5 6 7 8 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel default(none)
printf("I think the number is %d.\n", number++);
return 0;
}
|
This tells the compiler that a programmer must explicitly set the data sharing rule for each variable.
It is therefore not surprising that the compiler produces an error indicating that the number
variable is not specified in the enclosing parallel region:
$ gcc -o my_program my_program.c -Wall -fopenmp
my_program.c: In function ‘main’:
my_program.c:6:5: error: ‘number’ not specified in enclosing ‘parallel’
6 | printf("I think the number is %d.\n", number++);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
my_program.c:5:13: error: enclosing ‘parallel’
5 | #pragma omp parallel default(none)
|
We can now set the number
variable to firstprivate:
1 2 3 4 5 6 7 8 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel default(none) firstprivate(number)
printf("I think the number is %d.\n", number++);
return 0;
}
|
It is generally recommended that a programmer sets the data sharing rules explicitly as this forces them to think about the data sharing rules. It is also advisable to declare all private variables inside the structured block.
Challenge
Fix the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdio.h>
char *str = "I think the number is %d.\n";
int main() {
int initial_number = 1;
#pragma omp parallel
int number = initial_number;
printf(str, number++);
return 0;
}
|
Use explicit data sharing rules.
Solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #include <stdio.h>
char *str = "I think the number is %d.\n";
int main() {
int initial_number = 1;
#pragma omp parallel default(none) shared(str, initial_number)
{
int number = initial_number;
printf(str, number++);
}
return 0;
}
|
First, we add the enclosed { }
brackets thus making the number
variable private.
Next, we use default(none)
to force explicit data sharing rules.
Finally, we declare the str
and initial_number
variables shared as none of the threads modify these variables.
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 1.
...
It is also possible to declare the variables str
and initial_number
as firstprivate
.
However, the creation of private variables causes some overhead and it is therefore generally recommended that variables that can be declared shared are declared as shared.