Data for Parallel Regions
This guide covers:
Data:
What is private data
What is shared data
How to control which is which
Race conditions:
Basic constructs to avoid data races
Private Data
Private data is typically used for control variables, including:
Thread identification
Loop indices
Variables internal to the algorithm
Default Private Variables
Most variables declared inside a parallel region are private by default:
Variables declared inside the block (C/C++)
Variables in subroutine/function called from inside parallel region
Exceptions
The following are NOT private by default:
static
(C/C++) orsave
(Fortran) variablesFile scope variables (C/C++) or
COMMON
blocksVariables passed by reference inherit their data-sharing attribute
Warning
In Fortran, special care is needed with COMMON
and EQUIVALENCE
statements.
Example: Memory Movements for Private Data (Fortran)
integer :: b
b = 5
!$OMP parallel &
!$OMP private(b)
b = omp_get_thread_num()
b = b + 3
!$OMP end parallel
b = 7
Memory Layout
Main Memory: b = 5
Thread 0: b = 0 → b = 3
Thread 1: b = 1 → b = 4
Thread 2: b = 2 → b = 5
Thread 3: b = 3 → b = 6
Main Memory: b = 7
Note
Each thread has its own copy of b
, and changes do not affect the original value in main memory.
Example: Memory Movements for Private Data (C)
int b;
b = 5;
#pragma omp parallel \
private(b)
{
b = omp_get_thread_num();
b += 3;
}
b = 7;
Memory Layout
Main Memory: b = 5
Thread 0: b = 0 → b = 3
Thread 1: b = 1 → b = 4
Thread 2: b = 2 → b = 5
Thread 3: b = 3 → b = 6
Main Memory: b = 7
Note
Each thread has its own copy of b
, and changes do not affect the original value in main memory.
Shared Data
Majority of the data in parallel programs
Typically large data structures (e.g., arrays)
Properties
Keeps its value on entry to parallel region
Keeps its value on exit from parallel region
Every thread can access (read and/or write) the data
Safety Considerations
Safe scenario:
Multiple threads only read the data
Dangerous scenario:
Multiple threads access the same memory location
At least one of these is a write access
This easily results in a race condition
Example: Vector Initialization (Fortran)
integer, parameter :: vleng = 120
integer :: vect(vleng), myNum, start, fin, i
!$omp parallel shared(vect) &
!$omp private(myNum, start, fin, i)
myNum = vleng / omp_get_num_threads()
start = 1 + omp_get_thread_num() * myNum
fin = (omp_get_thread_num() + 1) * myNum
do i = start, fin
vect(i) = 4 * i ! threads write different elements
enddo
!$omp end parallel
Mathematical notation: \(v_i = 4i\)
Note
This is safe because each thread writes to different elements of the shared array.
Example: Vector Initialization (C)
const int vleng = 120;
int vect[vleng], myNum, start, fin, i;
#pragma omp parallel shared(vect) \
private(myNum, start, fin, i)
{
myNum = vleng / omp_get_num_threads();
start = omp_get_thread_num() * myNum;
fin = start + myNum;
for (i = start; i < fin; i++)
vect[i] = 4 * i; // threads write different elements
}
Mathematical notation: \(v_i = 4i\)
Note
This is safe because each thread writes to different elements of the shared array.
Example: Write Conflict for Shared Data (Fortran)
integer :: a, b
a = 5
!$OMP parallel &
!$OMP shared(a, b)
b = a + omp_get_thread_num()
print *, "updated"
print *, "my b:", b
!$OMP end parallel
Memory Behavior
Main Memory: a = 5
All threads read: a = 5
Thread 0: b = 5
Thread 1: b = 6
Thread 2: b = 7
Thread 3: b = 8
Final b value: RANDOM (could be 5, 6, 7, or 8)
Warning
Final
b
value is random/unpredictableIndividual threads might print
b
before it has its final valueThis is a race condition
Example: Write Conflict for Shared Data (C)
int a, b;
a = 5;
#pragma omp parallel \
shared(a, b)
{
b = a + omp_get_thread_num();
printf("updated\n");
printf("my b: %i\n", b);
}
Memory Behavior
Main Memory: a = 5
All threads read: a = 5
Thread 0: b = 5
Thread 1: b = 6
Thread 2: b = 7
Thread 3: b = 8
Final b value: RANDOM (could be 5, 6, 7, or 8)
Warning
Final
b
value is random/unpredictableIndividual threads might print
b
before it has its final valueThis is a race condition
Default Clause
The default
clause can be used on a parallel or task construct to determine data sharing of implicitly determined variables.
In C:
default(shared | none)
In Fortran:
default(shared | none | private | firstprivate)
For parallel constructs, if no default
clause is supplied, default(shared)
applies.
Recommendation
Important
Using default(none)
is typically a good idea!
With default(none)
, all variables accessed in the parallel region must be explicitly declared as shared
, private
, etc.
Fixing Data Races
OpenMP provides several constructs to avoid data races:
barrier
- synchronization pointcritical
- mutual exclusion regionatomic
- lightweight protection for simple operations
Note
These constructs impact code performance, but we have no interest in “fast garbage”!
Barrier and Synchronization
The Barrier Construct
Fortran:
!$omp barrier
C:
#pragma omp barrier
Behavior
All threads wait for the last one to arrive at the barrier
Registers are flushed to the memory system
All threads must have the barrier in their line of execution
Warning
If not all threads reach the barrier, a deadlock will occur!
Visual Representation
Thread 0: A ──────┐
Thread 1: A ──────┤
Thread 2: A ──────┼── BARRIER ──┬── B
Thread 3: A ──────┘ ├── B
├── B
└── B
Time ─────────────────────────>
Example: Data Race in Matrix Transpose (Fortran)
!$omp parallel default(none) &
!$omp private(mysize, tid, i, j) shared(matrix, mtrans)
tid = omp_get_thread_num()
mysize = nsize / omp_get_num_threads()
do j = 1 + tid*mysize, (tid+1)*mysize
do i = 1, nsize
matrix(i,j) = 1000.0 * j + i
enddo
enddo
!$omp barrier
do j = 1 + tid*mysize, (tid+1)*mysize
do i = 1, nsize
mtrans(i,j) = matrix(j,i)
enddo
enddo
!$omp end parallel
Note
The barrier ensures that all threads complete writing to matrix
before any thread begins reading from it for the transpose operation.
Example: Data Race in Matrix Transpose (C)
#pragma omp parallel default(none) \
private(mysize, tid, i, j) shared(matrix, mtrans)
{
tid = omp_get_thread_num();
mysize = nsize / omp_get_num_threads();
for (i = tid*mysize; i < (tid+1)*mysize; i++)
for (j = 0; j < nsize; j++)
matrix[i][j] = 1000.0 * j + i;
#pragma omp barrier
for (i = tid*mysize; i < (tid+1)*mysize; i++)
for (j = 0; j < nsize; j++)
mtrans[i][j] = matrix[j][i];
}
Note
The barrier ensures that all threads complete writing to matrix
before any thread begins reading from it for the transpose operation.
Critical Regions
Critical regions protect updates of shared memory locations by ensuring only one thread executes the critical region at a time.
Syntax in C
#pragma omp critical (name)
{
code-block
}
Syntax in Fortran
!$omp critical (name)
code-block
!$omp end critical (name)
Name is optional:
If named: only one thread in all regions with the same name
If unnamed: only one thread in all unnamed regions
Implies a register flush at entrance and exit
Useful to execute non-thread-safe functions
Performance penalty due to serialization
Example: Use of Critical Region (Fortran)
Computing a sum with critical section:
sum = 0.0_dpr
!$omp parallel default(none) &
!$omp shared(sum) private(tid, cont)
tid = omp_get_thread_num()
cont = func(tid)
!$omp critical (exp_up)
sum = sum + cont
print *, tid, ": c=", cont, " s=", sum
!$omp end critical (exp_up)
!$omp end parallel
Mathematical notation: \(\sum_{k=0}^{n-1} e^k\)
Note
The critical region ensures that only one thread updates sum
at a time, preventing race conditions.
Example: Use of Critical Region (C)
Computing a sum with critical section:
sum = 0.0;
#pragma omp parallel default(none) \
shared(sum) private(tid, cont)
{
tid = omp_get_thread_num();
cont = func(tid);
#pragma omp critical (exp_up)
{
sum += cont;
printf("%i: c=%f s=%f\n", tid, cont, sum);
}
}
Mathematical notation: \(\sum_{k=0}^{n-1} e^k\)
Note
The critical region ensures that only one thread updates sum
at a time, preventing race conditions.
Atomic Operations
atomic
is a lightweight alternative to critical
for simple cases.
Works with simple statements only
Can use special hardware instructions if they exist
Flushes the “protected” variable on entry and exit
Much more efficient than
critical
for simple operations
Versions (from OpenMP 3.1)
Four different versions:
read
- atomic read operationwrite
- atomic write operationupdate
- atomic update operationcapture
- atomic update with capture of old/new value
OpenMP 4.0 Enhancement
Adding seq_cst
to atomic flushes all variables:
Important for controlling instruction reordering
Example use case: implementing a lock
Atomic Read
Protects only the reading of a scalar intrinsic variable.
Fortran Syntax
!$omp atomic read
v = x
C Syntax
#pragma omp atomic read
v = x;
Protects only the reading of scalar variable
x
Flushes
x
on entry and exit
Atomic Write
Protects only the writing of a scalar intrinsic variable.
Fortran Syntax
!$omp atomic write
x = expr
C Syntax
#pragma omp atomic write
x = expr;
Example Expressions
x = 5;
x = v;
x = func(a);
Warning
Protects only the writing of
x
No protection for evaluation of
expr
on the right-hand sideFlushes
x
on entry and exit
Atomic Update
Protects the update of a variable in simple arithmetic operations.
Note
atomic update
was the only atomic operation prior to OpenMP 3.1. The update
keyword is optional for backward compatibility.
Only protects the update of the variable, not function calls on the right-hand side
Works with simple statements only
Can use special hardware instructions if available
Flushes the updated variable on entry and exit
Example
x += func(a);
x = x + func(a);
Warning
The evaluation of func(a)
is NOT protected. Use critical
if protection is needed!
Atomic Update: Fortran Statements
Examples
!$omp atomic update
x = x + 1
!$omp atomic update
x = x + f(a)
Warning
The evaluation of f(a)
is NOT protected. Use critical
if needed!
Allowed Operations
x = x operator expr
x = expr operator x
x = intr_proc(x, expr_list)
x = intr_proc(expr_list, x)
Where:
x
is scalar, intrinsic typeoperator
is one of:+
,*
,-
,/
,.AND.
,.OR.
,.EQV.
,.NEQV.
intr_proc
is one of:MAX
,MIN
,IAND
,IOR
,IEOR
Note
The update
keyword is optional for consistency with older OpenMP standards.
Example: Vector Norm (Fortran)
norm = 0.0D0
!$omp parallel default(none) &
!$omp shared(vect, norm) private(myNum, i, lNorm)
lNorm = 0.0D0
myNum = vleng / omp_get_num_threads() ! local size
do i = 1 + myNum * omp_get_thread_num(), &
myNum * (1 + omp_get_thread_num())
lNorm = lNorm + vect(i) * vect(i)
enddo
!$omp atomic update
norm = norm + lNorm
!$omp end parallel
norm = sqrt(norm)
Mathematical notation: \(\sqrt{\sum_i v(i) \cdot v(i)}\)
Note
Each thread computes a local sum (lNorm
), then atomically adds it to the global norm
.
Atomic Update: C Statements
Examples
#pragma omp atomic update
x++;
#pragma omp atomic update
x += f(a);
Warning
The evaluation of f(a)
is NOT protected. Use critical
if needed!
Allowed Operations
x binop= expr;
x++;
++x;
x--;
--x;
x = x binop expr;
Where:
x
is lvalue, scalarbinop
is one of:+
,*
,-
,/
,&
,^
,|
,<<
,>>
Note
The update
keyword is optional for consistency with older OpenMP standards.
Example: Vector Norm (C)
norm = 0.0;
#pragma omp parallel default(none) \
shared(vect, norm) private(myNum, i, lNorm)
{
lNorm = 0.0;
myNum = vleng / omp_get_num_threads(); // local size
for (i = myNum * omp_get_thread_num();
i < myNum * (1 + omp_get_thread_num()); i++)
lNorm += vect[i] * vect[i];
#pragma omp atomic update
norm += lNorm;
} // synchronize at end parallel
norm = sqrt(norm);
Mathematical notation: \(\sqrt{\sum_i v(i) \cdot v(i)}\)
Note
Each thread computes a local sum (lNorm
), then atomically adds it to the global norm
.
Atomic Capture
Atomic capture allows you to:
Update a shared variable atomically
Keep a thread-private copy of either (but not both):
The old value before update
The new value after update
Restrictions apply to the allowed statement forms.
Atomic Capture: C Statements
#pragma omp atomic capture
statement_or_structured_block
Allowed Statements (OpenMP 4.0)
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
Allowed Structured Blocks
{v = x; x binop= expr;}
{x binop= expr; v = x;}
{v = x; x = x binop expr;}
{v = x; x = expr binop x;}
{x = x binop expr; v = x;}
{x = expr binop x; v = x;}
{v = x; x = expr;}
{v = x; x++;}
{v = x; ++x;}
{++x; v = x;}
{x++; v = x;}
{v = x; x--;}
{v = x; --x;}
{--x; v = x;}
{x--; v = x;}
Atomic Capture: Fortran Statements
Syntax Form 1
!$omp atomic capture
update-statement
capture-statement
!$omp end atomic
Syntax Form 2
!$omp atomic capture
capture-statement
update-statement
!$omp end atomic
Allowed Update Statements
x = x operator expr
x = expr operator x
x = intr_proc(x, expr_list)
x = intr_proc(expr_list, x)
Allowed Capture Statements
v = x
Summary
This guide covered the following OpenMP concepts:
Data Management:
Private data: each thread has its own copy
Shared data: accessible by all threads
Controlling data attributes with clauses
Preventing Race Conditions:
barrier
: synchronization point for all threadscritical
: mutual exclusion for code regionsatomic
: lightweight protection for simple operationsread
,write
,update
,capture
Parallelization Strategies:
Examples demonstrated various approaches to parallel data management
Techniques for avoiding data races while maintaining performance