OpenMP task basics (part 1)
Important
The Kebnekaise login and compute nodes have the OMP_NUM_THREADS
environmental variable set to 1
by default.
If you are using the Kebnekaise login nodes to experiment with OpenMP, then it is important to set the
OMP_NUM_THREADS
environmental variable to some reasonable value:$ export OMP_NUM_THREADS=8
Please note that you are not allowed to run long computations on the login nodes!
If you are using the Kebnekaise compute nodes to experiment with OpenMP, then either unset the
OMP_NUM_THREADS
environmental variable:$ unset OMP_NUM_THREADS
or, if you specified the
--cpus-per-task=<cpu count>
SLURM argument, set theOMP_NUM_THREADS
environmental variable to the number of CPU cores available for the task:$ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Terminology
Let us go through some basic terminology:
- Task:
A specific instance of executable code and its data environment that the OpenMP implementation can schedule for execution by threads.
- Task region:
A region consisting of all code encountered during the execution of a task.
- Implicit task:
A task generated by an implicit parallel region or generated when a
parallel
construct is encountered during execution.- Initial task:
An implicit task associated with an implicit parallel region.
- Explicit task:
A task that is not an implicit task.
We can see that each implicit parallel region or parallel
construct creates a set of implicit tasks.
Each implicit task is assigned to a different thread in the created team.
That is, you have already done task-based programming!
- Binding thread set:
The set of threads that are affected by, or provide the context for, the execution of a region.
- Binding implicit task:
The implicit task of the current thread team assigned to the encountering thread.
Most OpenMP constructs have a binding thread set that defines the threads involved in the construct. We can also see that the current team is bound to a binding implicit task. These terms will help us to understand where the tasks are executed.
We also need to understand how tasks are related to each others:
- Current task:
For a given thread, the task corresponding to the task region in which it is executing.
- Child task:
A task is a child task of its generating task region.
- Sibling tasks:
Tasks that are child tasks of the same task region.
- Descendent task:
A task that is the child task of a task region or of one of its descendent task regions.
We can see that tasks can generate child tasks and these child tasks can generate more child tasks.
Finally, we can see that OpenMP tasks can have dependences between them:
- Task dependence:
An ordering relation between two sibling tasks: the dependent task and a previously generated predecessor task. The task dependence is fulfilled when the predecessor task has completed.
- Dependent task:
A task that because of a task dependence cannot be executed until its predecessor tasks have completed.
Task construct
The simplest way to create an explicit task in OpenMP is the task
construct:
#pragma omp task [clause[ [,] clause] ... ] new-line
structured-block
The thread that encounters the task
construct creates an explicit task from the structured block.
The encountering thread
may execute the task immediately or
defer its execution to one of the other threads in the team.
A task is always bound to the innermost parallel region. If a task construct is encountered outside a parallel region, then the structured block is executed immediately by the encountering thread.
The task
construct accepts a set of clauses:
if([ task :] scalar-expression)
final(scalar-expression)
untied
default(shared | none)
mergeable
private(list)
firstprivate(list)
shared(list)
in_reduction(reduction-identifier : list)
depend([depend-modifier,] dependence-type : locator-list)
priority(priority-value)
allocate([allocator :] list)
affinity([aff-modifier :] locator-list)
detach(event-handle)
We can already recognise some of the clauses.
For example, the if
clause can be used to enable/disable the creation of the corresponding task, and the default
, private
, firstprivate
, and shared
clauses are used to control the data sharing rules.
It should be noted that some of these clauses behave slightly differently when compared to the traditional OpenMP constructs.
Let us return to the earlier “Hello world” program:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 #pragma omp task
7 printf("Hello world!\n");
8 }
9 return 0;
10}
Note that the task
pragma is inside a parallel construct.
Each thread in the team
encounters the task construct,
creates the corresponding task and
either executes the task immediately or defer its execution to one of the other threads in the team:
Therefore, the number of tasks, and lines printed, are the same as the number of threads in the team:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Hello world!
...
Hello world!
Data sharing rules
We must again begin by discussing the data sharing rules. Let us reconsider an earlier example:
1#include <stdio.h>
2
3int main() {
4 int number = 1;
5 #pragma omp parallel
6 {
7 #pragma omp task
8 {
9 printf("I think the number is %d.\n", number);
10 number++;
11 }
12 }
13 return 0;
14}
The output of the program is not that surprising:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 2.
I think the number is 2.
I think the number is 3.
...
That is, variables declared outside the parallel construct are still shared
by default.
This is consistent with the three basic data sharing rules.
If we move the variable number
inside the parallel construct, then the variable becomes firstprivate
by default:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 int number = 1;
7 #pragma omp task
8 {
9 printf("I think the number is %d.\n", number);
10 number++;
11 }
12 }
13 return 0;
14}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 1.
I think the number is 1.
...
The value of the variable is copied when the task is created.
Single construct
In the earlier examples, each thread in the team created a task.
This is sometimes very convenient as the need for new tasks might arise gradually.
However, it is more likely that we want to generate the task graph in a centralized manner, i.e. only one thread should generate the task.
This can be accomplished by combining the parallel
and single
constructs:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 #pragma omp single nowait
6 {
7 #pragma omp task
8 printf("Hello world!\n");
9 }
10 return 0;
11}
The nowait
clause removes the redundant barrier from the end of the single
construct.
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Note that the binding thread set for a single
region is the current team.
That is, any tasks in the current team can execute the task.
Barrier construct
It is sometimes necessary to wait until all earlier tasks have been executed.
This can be accomplished with the barrier
construct:
#pragma omp barrier new-line
All threads in the team must reach the barrier and complete all explicit tasks bound to the parallel region before they are allowed to continue execution beyond the barrier:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 {
6 #pragma omp task
7 printf("Hello.\n");
8
9 #pragma omp barrier
10
11 #pragma omp task
12 printf("Goodbye.\n");
13 }
14
15 return 0;
16}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello.
Hello.
...
Hello.
Goodbye.
Goodbye.
Goodbye.
...
Child tasks and taskwait construct
A task can create new child tasks:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 #pragma omp single
6 {
7 #pragma omp task
8 {
9 #pragma omp task
10 printf("Hello.\n");
11
12 printf("Hi.\n");
13 }
14
15 printf("Hej.\n");
16 }
17
18 printf("Goodbye.\n");
19
20 return 0;
21}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hej
Hi.
Hello.
Goodbye.
$ ./my_program
Hej.
Hello.
Hi.
Goodbye.
Note that child tasks are executed separately from the generating tasks.
In particular, it is possible that a child task gets executed after the generating task has finished.
We can use the taskwait
construct to wait on the completion of child tasks of the generating task:
#pragma omp taskwait [clause[ [,] clause] ... ] new-line
This allows us to enforce an execution order:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 #pragma omp single
6 {
7 #pragma omp task
8 {
9 #pragma omp task
10 printf("Hello.\n");
11
12 #pragma omp taskwait
13
14 printf("Hi.\n");
15 }
16
17 printf("Hej.\n");
18 }
19
20 printf("Goodbye.\n");
21
22 return 0;
23}
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hej.
Hello.
Hi.
Goodbye.