OpenMP task basics (part 1)¶
Objectives
Learn the basic terminology.
Learn about the
task
construct.Learn more about data sharing rules.
Learn how to construct a task graph in a three-based manner.
Learn how to construct a task graph in a centralised manner.
Learn how to wait tasks to complete their execution.
Important
The Kebnekaise login and compute nodes have the OMP_NUM_THREADS
environmental variable set to 1
by default.
If you are using the Kebnekaise login nodes to experiment with OpenMP, then it is important to set the
OMP_NUM_THREADS
environmental variable to some reasonable value:$ export OMP_NUM_THREADS=8
Please note that you are not allowed to run long computations on the login nodes!
If you are using the Kebnekaise compute nodes to experiment with OpenMP, then either unset the
OMP_NUM_THREADS
environmental variable:$ unset OMP_NUM_THREADS
or, if you specified the
--cpus-per-task=<cpu count>
SLURM argument, set theOMP_NUM_THREADS
environmental variable to the number of CPU cores available for the task:$ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
Terminology¶
Let us go through some basic terminology:
- Task
A specific instance of executable code and its data environment that the OpenMP implementation can schedule for execution by threads.
- Task region
A region consisting of all code encountered during the execution of a task.
- Implicit task
A task generated by an implicit parallel region or generated when a
parallel
construct is encountered during execution.- Initial task
An implicit task associated with an implicit parallel region.
- Explicit task
A task that is not an implicit task.
We can see that each implicit parallel region or parallel
construct creates a set of implicit tasks.
Each implicit task is assigned to a different thread in the created team.
That is, you have already done task-based programming!
- Binding thread set
The set of threads that are affected by, or provide the context for, the execution of a region.
- Binding implicit task
The implicit task of the current thread team assigned to the encountering thread.
Most OpenMP constructs have a binding thread set that defines the threads involved in the construct. We can also see that the current team is bound to a binding implicit task. These terms will help us to understand where the tasks are executed.
We also need to understand how tasks are related to each others:
- Current task
For a given thread, the task corresponding to the task region in which it is executing.
- Child task
A task is a child task of its generating task region.
- Sibling tasks
Tasks that are child tasks of the same task region.
- Descendent task
A task that is the child task of a task region or of one of its descendent task regions.
We can see that tasks can generate child tasks and these child tasks can generate more child tasks.
Finally, we can see that OpenMP tasks can have dependences between them:
- Task dependence
An ordering relation between two sibling tasks: the dependent task and a previously generated predecessor task. The task dependence is fulfilled when the predecessor task has completed.
- Dependent task
A task that because of a task dependence cannot be executed until its predecessor tasks have completed.
Task construct¶
The simplest way to create an explicit task in OpenMP is the task
construct:
#pragma omp task [clause[ [,] clause] ... ] new-line
structured-block
The thread that encounters the task
construct creates an explicit task from the structured block.
The encountering thread
may execute the task immediately or
defer its execution to one of the other threads in the team.
A task is always bound to the innermost parallel region. If a task construct is encountered outside a parallel region, then the structured block is executed immediately by the encountering thread.
The task
construct accepts a set of clauses:
if([ task :] scalar-expression)
final(scalar-expression)
untied
default(shared | none)
mergeable
private(list)
firstprivate(list)
shared(list)
in_reduction(reduction-identifier : list)
depend([depend-modifier,] dependence-type : locator-list)
priority(priority-value)
allocate([allocator :] list)
affinity([aff-modifier :] locator-list)
detach(event-handle)
We can already recognise some of the clauses.
For example, the if
clause can be used to enable/disable the creation of the corresponding task, and the default
, private
, firstprivate
, and shared
clauses are used to control the data sharing rules.
It should be noted that some of these clauses behave slightly differently when compared to the traditional OpenMP constructs.
Let us return to the earlier “Hello world” program:
1 2 3 4 5 6 7 8 9 10 | #include <stdio.h>
int main() {
#pragma omp parallel
{
#pragma omp task
printf("Hello world!\n");
}
return 0;
}
|
Note that the task
pragma is inside a parallel construct.
Each thread in the team
encounters the task construct,
creates the corresponding task and
either executes the task immediately or defer its execution to one of the other threads in the team:
Therefore, the number of tasks, and lines printed, are the same as the number of threads in the team:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Hello world!
...
Hello world!
Data sharing rules¶
We must again begin by discussing the data sharing rules. Let us reconsider an earlier example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel
{
#pragma omp task
{
printf("I think the number is %d.\n", number);
number++;
}
}
return 0;
}
|
The output of the program is not that surprising:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 2.
I think the number is 2.
I think the number is 3.
...
That is, variables declared outside the parallel construct are still shared
by default.
This is consistent with the three basic data sharing rules.
If we move the variable number
inside the parallel construct, then the variable becomes firstprivate
by default:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdio.h>
int main() {
#pragma omp parallel
{
int number = 1;
#pragma omp task
{
printf("I think the number is %d.\n", number);
number++;
}
}
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 1.
I think the number is 1.
I think the number is 1.
...
The value of the variable is copied when the task is created.
Challenge
Modify the following program such that it uses explicit data sharing rules and the incrementation (number++
) is protected with a critical
construct:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel
{
#pragma omp task
{
printf("I think the number is %d.\n", number);
number++;
}
}
return 0;
}
|
Note that the atomic
construct is usually a better approach when protecting a scalar update operation.
Hint: You may want to write the value of the number
variable to a private variable.
Solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel
{
#pragma omp task default(none) shared(number)
{
int my_number;
#pragma omp critical
my_number = number++;
printf("I think the number is %d.\n", my_number);
}
}
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1.
I think the number is 2.
I think the number is 5.
I think the number is 10.
...
Or
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include <stdio.h>
int main() {
int number = 1;
#pragma omp parallel
{
#pragma omp task default(none) shared(number)
{
int my_number;
#pragma omp atomic capture
my_number = number++;
printf("I think the number is %d.\n", my_number);
}
}
return 0;
}
|
Single construct¶
In the earlier examples, each thread in the team created a task.
This is sometimes very convenient as the need for new tasks might arise gradually.
However, it is more likely that we want to generate the task graph in a centralized manner, i.e. only one thread should generate the task.
This can be accomplished by combining the parallel
and single
constructs:
1 2 3 4 5 6 7 8 9 10 11 | #include <stdio.h>
int main() {
#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task
printf("Hello world!\n");
}
return 0;
}
|
The nowait
clause removes the redundant barrier from the end of the single
construct.
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello world!
Note that the binding thread set for a single
region is the current team.
That is, any tasks in the current team can execute the task.
Challenge
Write a program that creates 10 tasks. Each task should print the thread number of the calling thread.
Hint: From the omp.h
header file:
int omp_get_thread_num (void);
Solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <stdio.h>
#include <omp.h>
int main() {
#pragma omp parallel
#pragma omp single nowait
{
for (int i = 0; i < 10; i++) {
#pragma omp task
printf("I am thread no. %d.\n", omp_get_thread_num());
}
}
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I am thread no. 5.
I am thread no. 13.
I am thread no. 6.
I am thread no. 8.
I am thread no. 4.
I am thread no. 7.
I am thread no. 15.
I am thread no. 12.
I am thread no. 0.
I am thread no. 9.
Barrier construct¶
It is sometimes necessary to wait until all earlier tasks have been executed.
This can be accomplished with the barrier
construct:
#pragma omp barrier new-line
All threads in the team must reach the barrier and complete all explicit tasks bound to the parallel region before they are allowed to continue execution beyond the barrier:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <stdio.h>
int main() {
#pragma omp parallel
{
#pragma omp task
printf("Hello.\n");
#pragma omp barrier
#pragma omp task
printf("Goodbye.\n");
}
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hello.
Hello.
...
Hello.
Goodbye.
Goodbye.
Goodbye.
...
Child tasks and taskwait construct¶
A task can create new child tasks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #include <stdio.h>
int main() {
#pragma omp parallel
#pragma omp single
{
#pragma omp task
{
#pragma omp task
printf("Hello.\n");
printf("Hi.\n");
}
printf("Hej.\n");
}
printf("Goodbye.\n");
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hej
Hi.
Hello.
Goodbye.
$ ./my_program
Hej.
Hello.
Hi.
Goodbye.
Note that child tasks are executed separately from the generating tasks.
In particular, it is possible that a child task gets executed after the generating task has finished.
We can use the taskwait
construct to wait on the completion of child tasks of the generating task:
#pragma omp taskwait [clause[ [,] clause] ... ] new-line
This allows us to enforce an execution order:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | #include <stdio.h>
int main() {
#pragma omp parallel
#pragma omp single
{
#pragma omp task
{
#pragma omp task
printf("Hello.\n");
#pragma omp taskwait
printf("Hi.\n");
}
printf("Hej.\n");
}
printf("Goodbye.\n");
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hej.
Hello.
Hi.
Goodbye.
Challenge
Parallelize the following code using child tasks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #include <stdio.h>
int fib(int n)
{
if (n < 2)
return n;
int i, j;
i = fib(n-1);
j = fib(n-2);
return i + j;
}
int main() {
#pragma omp parallel
#pragma omp single
printf("fib(10) = %d\n", fib(10));
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
fib(10) = 55
Hint: Remember, variables defined inside a parallel construct are firstprivate
for a task region by default. Also, you must wait for certain tasks.
Solution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #include <stdio.h>
int fib(int n)
{
if (n < 2)
return n;
int i, j;
#pragma omp task default(none) shared(i) firstprivate(n)
i = fib(n-1);
#pragma omp task default(none) shared(j) firstprivate(n)
j = fib(n-2);
#pragma omp taskwait
return i + j;
}
int main() {
#pragma omp parallel
#pragma omp single
printf("fib(10) = %d\n", fib(10));
return 0;
}
|
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
fib(10) = 55