OpenMP task basics (part 2)
Task loop
If is very commont that a set of tasks is created inside a loop:
1#pragma omp parallel
2#pragma omp single
3{
4 for (int i = 0; i < n; i++) {
5 #pragma omp task
6 task_implementation(data[i]);
7 }
8}
This is not always very convenient. For example, the resulting task granularity can be too fine and we may want to combine several loop iterations into a single tasks:
1#pragma omp parallel
2#pragma omp single
3{
4 int per_task = 10;
5 for (int i = 0; i < n; i += per_task) {
6 #pragma omp task
7 {
8 for (int j = 0; j < per_task && i+j < n; j++)
9 task_implementation(data[i+j]);
10 }
11 }
12}
As you can clearly see, the resulting code is quite complicated and easy to implement incorrectly.
Fortunately the taskloop
construct can be used to solve this issue:
#pragma omp taskloop [clause[[,] clause] ...] new-line
for-loops
The construct specifies that the iterations of the loops will be executed in parallel using tasks. We can thus write the earlier example in a much shorter form:
1#pragma omp parallel
2#pragma omp single
3{
4 #pragma omp taskloop
5 for (int i = 0; i < n; i++)
6 task_implementation(data[i]);
7}
Unless otherwise specified, the number of iterations assigned to each task is decided by the OpenMP implementation. We can change this behaviour with clauses:
if([ taskloop :] scalar-expression)
shared(list)
private(list)
firstprivate(list)
lastprivate(list)
reduction([default ,]reduction-identifier : list)
in_reduction(reduction-identifier : list)
default(shared | none)
grainsize(grain-size)
num_tasks(num-tasks)
collapse(n)
final(scalar-expr)
priority(priority-value)
untied
mergeable
nogroup
allocate([allocator :] list)
In particular, the grainsize
sets the number of iterations assigned to each tasks and the num_tasks
sets the number of tasks generated.
Task group
The taskwait
construct specifies that the current task region is suspended until the completion of child tasks of the current task.
However, the construct does not specify that the current task region is suspended until the completion of descendants of the child tasks:
1#include <stdio.h>
2
3int main() {
4 #pragma omp parallel
5 #pragma omp single
6 {
7 #pragma omp task
8 {
9 #pragma omp task
10 printf("Hello.\n");
11
12 printf("Hi.\n");
13 }
14
15 #pragma omp taskwait
16
17 printf("Goodbye.\n");
18 }
19
20 return 0;
21}
In the above example, the tasks that prints the Hello
line is a descendant of the child task that prints the Hi
line.
As we can see, it is possible that the descendant gets executed after the taskwait
region:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Hi.
Goodbye.
Hello.
Tasks and their descendent tasks can be synchronized by containing them in a taskgroup
region.
The taskgroup
construct specifies a wait on completion of** child tasks** of the current task and their descendent tasks:
#pragma omp taskgroup [clause[[,] clause] ...] new-line
structured-block
Note that the taskgroup
construct is not a standalone construct.
Instead, we must enclose the task generating region with it.
All tasks generated inside a taskgroup
region are waited for at the end of the region.
Depend clause
Up to this point, we have only discussed tasks that are either mutually independent or are related to each other due to the fact that they are generated in a nested manner.
In the earlier lecture, we talked about task dependencies.
Since OpenMP 4.5, most task-related OpenMP constructs have accepted the depend
clause:
depend([depend-modifier,]dependence-type : locator-list)
where dependence-type
is one of the following:
in
out
inout
mutexinoutset
depobj
The most relevant ones of these are the following:
- in:
Input variable(s).
- out:
Output variable(s).
- inout:
Input and output variable(s).
The locator-list
argument lists all involved variables: var1, var2, ..., varN
.
A construct can have multiple depend
clauses, one for each dependence-type
.
The depend
clause is much more powerful than this but during this course we are going to use only the basic functionality.
As an example, consider the following ill-defined program:
1#include <stdio.h>
2
3int main() {
4 int number;
5
6 #pragma omp parallel
7 #pragma omp single nowait
8 {
9 #pragma omp task
10 number = 1;
11
12 #pragma omp task
13 {
14 printf("I think the number is %d\n", number);
15 number++;
16 }
17
18 #pragma omp task
19 printf("I think the final number is %d\n", number);
20 }
21
22 return 0;
23}
As expected, the result is not well-defined:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1
I think the final number is 1
$ ./my_program
I think the final number is 1
I think the number is 1
We can fix the issue by defining input and output variables for each task:
1#include <stdio.h>
2
3int main() {
4 int number;
5
6 #pragma omp parallel
7 #pragma omp single nowait
8 {
9 #pragma omp task depend(out: number)
10 number = 1;
11
12 #pragma omp task depend(inout: number)
13 {
14 printf("I think the number is %d\n", number);
15 number++;
16 }
17
18 #pragma omp task depend(in: number)
19 printf("I think the final number is %d\n", number);
20 }
21
22 return 0;
23}
That is,
the first task is going to write into the variable
number
,the second task is going to read and write from/into the variable
number
, andthe third task is going to read from the variable
number
.
These clauses force the OpenMP implementation to execute the tasks in an order that respects the induced task dependencies:
$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
I think the number is 1
I think the final number is 2
Priority clause
As discussed during an earlier lecture, we can give each task a priority in an attempt to help the runtime system to schedule the tasks in a more optimal order.
In OpenMP, the priority is given using the priority(priority-value)
clause.
The priority-value
is a non-negative integer expression and higher value implies higher priority.
You can enquiry the maxinum priority with the omp_get_max_task_priority()
function.
Untied clause and taskyield construct
Threads are allowed to suspend the current task region at a task scheduling point in order to execute a different task. A task scheduling point can occur
during the generation of an explicit task,
the point immediately following the generation of an explicit task,
after the point of completion of the structured block associated with a task,
in a
taskyield
region,in a
taskwait
region,at the end of a taskgroup region,
in an implicit barrier region, and
in an explicit barrier region.
In particular, we can use the taskyield
construct to force a task scheduling point.
#pragma omp taskyield new-line
Note that the above list is not complete.
By default, task regions are tied to the the initially assigned thread and only the initially assigned thread can later resume the execution of the suspended task region.
This behaviour can be changed with the untied
in which case any thread is allowed to resume the execution.
Mergeable and final clauses
The mergeable
and final
clauses can be used to reduce the number of generated tasks in deep nested task generation trees.
First, we need more terminology:
- Undeferred task:
A task for which execution is not deferred with respect to its generating task region. That is, its generating task region is suspended until execution of the structured block associated with the undeferred task is completed.
- Included task:
A task for which execution is sequentially included in the generating task region. That is, an included task is undeferred and executed by the encountering thread.
- Merged task:
A task for which the data environment, inclusive of ICVs, is the same as that of its generating task region.
- Mergeable task:
A task that may be a merged task if it is an undeferred task or an included task.
- Final task:
A task that forces all of its child tasks to become final and included tasks.
The mergeable
clause indicates that the task is mergeable.
If scalar-expression
is evaluated as true, then the final(scalar-expression)
clause indicates that the task is final.