Introduction to OpenMP (part 2)
Objectives
Learn about the
sections
andsection
constructs.Learn about the
loop
andfor
constructs.Learn about the
single
andmaster
constructs.Learn about the
critical
construct.Learn about the
barrier
construct.
Section construct
As we saw earlier, all threads within the team execute the entire structured block that follows a parallel construct. Only a very limited number of parallel algorithms can be implemented in this way. It is much more common that we have a set of mutually independent operations which we want to execute in parallel.
One way of accomplishing this is with the sections and section constructs:
#pragma omp sections [clause[ [,] clause] ... ] new-line
{
[#pragma omp section new-line]
structured-block
[#pragma omp section new-line
structured-block]
...
}
where clause is one of the following:
private(list)
firstprivate(list)
lastprivate([ lastprivate-modifier:] list)
reduction([reduction-modifier ,] reduction-identifier : list)
allocate([allocator :] list)
nowait
The structured blocks that follow the section
constructs inside the sections
construct are distributed among the threads within the team:
data:image/s3,"s3://crabby-images/71ac9/71ac940db49546fb0caced4f6bb900eb02734bbe" alt="../_images/section.png"
Each structured block is executed only once:
1#include <stdio.h>
2
3int main() {
4
5 #pragma omp parallel
6 {
7 printf("Everyone!\n");
8
9 #pragma omp sections
10 {
11 #pragma omp section
12 printf("Only me!\n");
13
14 #pragma omp section
15 printf("No one else!\n");
16
17 #pragma omp section
18 printf("Just me!\n");
19 }
20 }
21
22 return 0;
23}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Everyone!
Only me!
No one else!
Just me!
Everyone!
Everyone!
...
Note how the Everyone!
lines are printed multiple times but the other three lines are printed only once.
If we want, we can merge the parallel
and sections
constructs together:
1#include <stdio.h>
2
3int main() {
4
5 #pragma omp parallel sections
6 {
7 #pragma omp section
8 printf("Only me!\n");
9
10 #pragma omp section
11 printf("No one else!\n");
12
13 #pragma omp section
14 printf("Just me!\n");
15 }
16
17 return 0;
18}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Just me!
No one else!
Only me!
Exercise
Parallelize the following program using the sections
and section
constructs:
1#include <stdio.h>
2
3int main() {
4 int a, b, c, d;
5
6 a = 5;
7 b = 14;
8 c = a + b;
9 d = a + 44;
10 printf("a = %d, b = %d, c = %d, d = %d\n", a, b, c, d);
11
12 return 0;
13}
The program should print a = 5, b = 14, c = 19, d = 49
.
Pay attention to the data dependencies.
You may have to add more than one parallel
construct.
Solution
The statements a = 5;
and b = 14;
can be executed in parallel and we therefore add one parallel sections
construct for them.
The statements c = a + b;
and d = a + 44;
can be executed in parallel and we therefore add another parallel sections
construct for them.
1#include <stdio.h>
2
3int main() {
4 int a, b, c, d;
5
6 #pragma omp parallel sections
7 {
8 #pragma omp section
9 a = 5;
10 #pragma omp section
11 b = 14;
12 }
13 #pragma omp parallel sections
14 {
15 #pragma omp section
16 c = a + b;
17 #pragma omp section
18 d = a + 44;
19 }
20 printf("a = %d, b = %d, c = %d, d = %d\n", a, b, c, d);
21
22 return 0;
23}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
a = 5, b = 14, c = 19, d = 49
Parallel loop construct
Most programs contain several loops and parallelizing these loops is often a natural way to add some parallelism to a program.
The loop
construct does exactly that:
#pragma omp loop [clause[ [,] clause] ... ] new-line
for-loops
The construct tells OpenMP that the loop iterations are free of data dependencies and can therefore be executed in parallel.
The loop iterator is private
by default:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 #pragma omp loop
7 for (int i = 0; i < 5; i++)
8 printf("The loop iterator is %d.\n", i);
9 }
10}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterator is 1.
The loop iterator is 4.
The loop iterator is 0.
The loop iterator is 2.
The loop iterator is 3.
Like many other constructs, the loop
construct accepts several clauses:
bind(binding)
collapse(n)
order(concurrent)
private(list)
lastprivate(list)
reduction([default ,]reduction-identifier : list)
In particular, the collapse
clause allows us to collapse n
nested loops into a single parallel loop.
Otherwise, only the iterations of the outermost loop are executed in parallel.
Exercise
Collapse the two nested loops in the following program:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 #pragma omp loop
7 for (int i = 0; i < 3; i++)
8 for (int j = 0; j < 3; j++)
9 printf("The loop iterators are %d and %d.\n", i, j);
10 }
11}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterators are 2 and 0.
The loop iterators are 2 and 1.
The loop iterators are 2 and 2.
The loop iterators are 0 and 0.
The loop iterators are 0 and 1.
The loop iterators are 0 and 2.
The loop iterators are 1 and 0.
The loop iterators are 1 and 1.
The loop iterators are 1 and 2.
Note how the innermost loop is always executed sequentially. What changes?
Solution
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 #pragma omp loop collapse(2)
7 for (int i = 0; i < 3; i++)
8 for (int j = 0; j < 3; j++)
9 printf("The loop iterators are %d and %d.\n", i, j);
10 }
11}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterators are 2 and 2.
The loop iterators are 0 and 0.
The loop iterators are 2 and 1.
The loop iterators are 0 and 1.
The loop iterators are 2 and 0.
The loop iterators are 1 and 2.
The loop iterators are 0 and 2.
The loop iterators are 1 and 0.
The loop iterators are 1 and 1.
Note that the iterations from both loops are now executed in an arbitrary order.
If we want, we can merge the parallel
and loop
constructs together:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel loop
5 for (int i = 0; i < 5; i++)
6 printf("The loop iterator is %d.\n", i);
7}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterator is 4.
The loop iterator is 0.
The loop iterator is 2.
The loop iterator is 3.
The loop iterator is 1.
Or use an older for
construct:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel for
5 for (int i = 0; i < 5; i++)
6 printf("The loop iterator is %d.\n", i);
7}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterator is 3.
The loop iterator is 1.
The loop iterator is 0.
The loop iterator is 2.
The loop iterator is 4.
Single and master constructs
It is sometimes necessary to execute a structured block only once inside a parallel region.
The single
construct does exactly this:
#pragma omp single [clause[ [,] clause] ... ] new-line
structured-block
The structured block is executed only once by one of the threads in the team:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 printf("In parallel.\n");
7 #pragma omp single
8 printf("Only once.\n");
9 printf("More in parallel.\n");
10 }
11}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
In parallel.
Only once.
In parallel.
In parallel.
...
In parallel.
More in parallel.
More in parallel.
...
More in parallel.
Note that all In parallel
lines and the Only once
line are printed before any More in parallel
lines are printed.
This happens because the single
construct introduces an implicit barrier to the exit of the single region.
That is, all threads in the team must wait until one of the threads has executed the structured block that is associated with the single
construct:
data:image/s3,"s3://crabby-images/908db/908db479ed2a40d2f9e0f5f368db14a3a20eaade" alt="../_images/barrier.png"
We can disable this behaviour using the nowait
clause:
private(list)
firstprivate(list)
copyprivate(list)
allocate([allocator :] list)
nowait
The single
construct is closely connected to the master
construct:
#pragma omp master new-line
structured-block
However, there are two primary differences:
Only the master thread of the current team can execute the associated structured block.
There is no implied barrier either on entry to, or exit from, the master region.
Critical construct
It is sometimes necessary to allow only one thread to execute a structured block concurrently:
#pragma omp critical [(name) [[,] hint(hint-expression)] ] new-line
structured-block
Several critical
constructs can be joined together by giving them the same name:
#pragma omp critical (protect_x)
x++;
...
#pragma omp critical (protect_x)
x = x - 15;
Exercise
Modify the following program such that the printf
and number++
statements are protected:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel
6 printf("I think the number is %d.\n", number++);
7 return 0;
8}
Solution
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel
6 #pragma omp critical
7 printf("I think the number is %d.\n", number++);
8 return 0;
9}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 2.
I think the number is 3.
I think the number is 4.
...
Barrier construct
Finally, we can add an explicit barrier:
#pragma omp barrier new-line
That is, all threads in the team must wait until all other threads in the team have encountered the barrier
construct:
data:image/s3,"s3://crabby-images/908db/908db479ed2a40d2f9e0f5f368db14a3a20eaade" alt="../_images/barrier.png"