Introduction to OpenMP (part 2)

Section construct

As we saw earlier, all threads within the team execute the entire structured block that follows a parallel construct. Only a very limited number of parallel algorithms can be implemented in this way. It is much more common that we have a set of mutually independent operations which we want to execute in parallel.

One way of accomplishing this is with the sections and section constructs:

#pragma omp sections [clause[ [,] clause] ... ] new-line
{
    [#pragma omp section new-line]
        structured-block
    [#pragma omp section new-line
        structured-block]
    ...
}

where clause is one of the following:

private(list)
firstprivate(list)
lastprivate([ lastprivate-modifier:] list)
reduction([reduction-modifier ,] reduction-identifier : list)
allocate([allocator :] list)
nowait

The structured blocks that follow the section constructs inside the sections construct are distributed among the threads within the team:

Each structured block is executed only once:

#include <stdio.h>

int main() {

    #pragma omp parallel
    {
        printf("Everyone!\n");

        #pragma omp sections
        {
            #pragma omp section
            printf("Only me!\n");

            #pragma omp section
            printf("No one else!\n");

            #pragma omp section
            printf("Just me!\n");
        }
    }

    return 0;
}

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Everyone!
Only me!
No one else!
Just me!
Everyone!
Everyone!
...

Note how the Everyone! lines are printed multiple times but the other three lines are printed only once.

If we want, we can merge the parallel and sections constructs together:

#include <stdio.h>

int main() {

    #pragma omp parallel sections
    {
        #pragma omp section
        printf("Only me!\n");

        #pragma omp section
        printf("No one else!\n");

        #pragma omp section
        printf("Just me!\n");
    }

    return 0;
}

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
Just me!
No one else!
Only me!

Parallel loop construct

Most programs contain several loops and parallelizing these loops is often a natural way to add some parallelism to a program. The loop construct does exactly that:

#pragma omp loop [clause[ [,] clause] ... ] new-line
    for-loops

The construct tells OpenMP that the loop iterations are free of data dependencies and can therefore be executed in parallel. The loop iterator is private by default:

#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        #pragma omp loop
        for (int i = 0; i < 5; i++)
            printf("The loop iterator is %d.\n", i);
    }
}

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterator is 1.
The loop iterator is 4.
The loop iterator is 0.
The loop iterator is 2.
The loop iterator is 3.

Like many other constructs, the loop construct accepts several clauses:

bind(binding)
collapse(n)
order(concurrent)
private(list)
lastprivate(list)
reduction([default ,]reduction-identifier : list)

In particular, the collapse clause allows us to collapse n nested loops into a single parallel loop. Otherwise, only the iterations of the outermost loop are executed in parallel.

If we want, we can merge the parallel and loop constructs together:

#include <stdio.h>

int main() {
    #pragma omp parallel loop
    for (int i = 0; i < 5; i++)
        printf("The loop iterator is %d.\n", i);
}

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterator is 4.
The loop iterator is 0.
The loop iterator is 2.
The loop iterator is 3.
The loop iterator is 1.

Or use an older for construct:

#include <stdio.h>

int main() {
    #pragma omp parallel for
    for (int i = 0; i < 5; i++)
        printf("The loop iterator is %d.\n", i);
}

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
The loop iterator is 3.
The loop iterator is 1.
The loop iterator is 0.
The loop iterator is 2.
The loop iterator is 4.

Single and master constructs

It is sometimes necessary to execute a structured block only once inside a parallel region. The single construct does exactly this:

#pragma omp single [clause[ [,] clause] ... ] new-line
    structured-block

The structured block is executed only once by one of the threads in the team:

#include <stdio.h>

int main() {
    #pragma omp parallel
    {
        printf("In parallel.\n");
        #pragma omp single
        printf("Only once.\n");
        printf("More in parallel.\n");
    }
}

$ gcc -o my_program my_program.c -Wall -fopenmp
$ ./my_program
In parallel.
Only once.
In parallel.
In parallel.
...
In parallel.
More in parallel.
More in parallel.
...
More in parallel.

Note that all In parallel lines and the Only once line are printed before any More in parallel lines are printed. This happens because the single construct introduces an implicit barrier to the exit of the single region. That is, all threads in the team must wait until one of the threads has executed the structured block that is associated with the single construct:

We can disable this behaviour using the nowait clause:

private(list)
firstprivate(list)
copyprivate(list)
allocate([allocator :] list)
nowait

The single construct is closely connected to the master construct:

#pragma omp master new-line
    structured-block

However, there are two primary differences:

Only the master thread of the current team can execute the associated structured block.

There is no implied barrier either on entry to, or exit from, the master region.

Critical construct

It is sometimes necessary to allow only one thread to execute a structured block concurrently:

#pragma omp critical [(name) [[,] hint(hint-expression)] ] new-line
    structured-block

Several critical constructs can be joined together by giving them the same name:

#pragma omp critical (protect_x)
    x++;

...

#pragma omp critical (protect_x)
    x = x - 15;

Barrier construct

Finally, we can add an explicit barrier:

#pragma omp barrier new-line

That is, all threads in the team must wait until all other threads in the team have encountered the barrier construct: