Exercise Sheet 1: Introduction and OpenMP

Terminology: Concurrency and Parallelism

Give an example for a problem that can be solved concurrently but not in parallel, and that can be solved in parallel but not concurrently. If you think this doesn't exist, explain why.

Suggested Answer

Recall that concurrency is a semantic property, while parallelism is an implementation choice.

Coroutines (or in their weaker form, generators) are a canonical example of non-parallel concurrency. They can be implemented using parallelism or by suspending the execution of one computation, and gives control over to a different coroutine.

In the worst case, this is a restriction, as with old, single-core operating systems that lacked the means to implement a preemptive scheduler and couldn't physically execute multiple processes. On the other hand, coroutines can serve as a means of modularization that make complex programs more manageable to maintain. The related notion of lazy evaluation in languages like Haskell also make use of this.

In the definition given in the lecture, parallelism without concurrency doesn't make sense, since any units of work that are done at the same time, must also can be done at the same time.

Speedup Arithmetic

Consider two programs, one sequential and one parallel. Executing these one after another on a shared memory multicore system yield the following results:

# Cores	Seq.	Par.
1	27s	34s
2	27s	18s
4	27s	10s
8	27s	7s

Compute the speedup and efficiency for 2, 4 and 8 cores and explain the results. Briefly comment on why on a single-core system the sequential version is faster than the parallel version.

Suggested Answer

The speedups are 27/18 = 1.5, 27/10 = 2.7 and 27/7 = 3.85714285714 and the efficiencies is 75%, 67.5% and 25.9% respectively. The efficiency decreases with the number of cores, due to Amdahl's law.

We observe that the sequential program is faster on a single-core system due to the overhead of (vain) parallelization.

Parallel Execution and Data Races

Consider the following tasks and explain whether or not data races could occur. Assume multiple actors are doing the same operation in parallel. If they can occur, suggest the most appropriate means of preventing them.

Adding a value to a binary tree.
Finding a value in a binary tree.
Incrementing an integer that fits in a machine word.
Creating a new file on a multi-user system.
Replacing an existing file on a multi-user system.

Suggested Answer

There is a data race, since two threads could attempt to mutate the same node leading to a race conditions. You can avoid this using some sort of locking regiment or by finding an adequate lock-free algorithm.
There is no data race, since all are only accessing values in the data structure
There is a data race, unless you use operations like "fetch and add" allow you to do just that.
Depending on the operating and file system, but on UNIX open proposes that it creates a file atomically. If two processes attempt to create a file, only one should succeed.
Again it depends on the operating system, but replacing a regular file is not an atomic operation. You have to protect it my some sort of lock (e.g. file locks or POSIX semaphores). You can replace a file atomically by renameing it, but it remains a race who gets to execute the system call last.

OpenMP: Parallelization

Judge whether or not you can correctly parallelize these loops using OpenMP just by adding a #pragma omp, without adjusting the program. Briefly explain your answer, and if you think it is possible, give the pragma you would use.

for (int i = 0; i < N; ++i) {
  a[i] = a[i] * a[i+1];
}

for (int i = 0; i < N-1; i += 2) {
  a[i] = a[i] * a[i+1];
}

bool any = false;
for (int i = 0; i < N; ++i) {
  any = any || c[i];
}

Suggested Answer

No, you cannot since neighboring elements of the array a form data dependencies that prevent parallelization.
Yes, since the even elements depend on their own value and that of their successors, while the value of the odd elements are not modified.
```
#pragma omp parallel for
```
Yes, we can even use reduction since this || over truth values forms a monoid, with false being the neutral element.
```
#pragma omp for reduction(||: any)
```

OpenMP: Reduction

The lectures introduced reduction. Explain in your own words what the advantage is of reducing data, instead of manually iterating. Specifically go into the dangers and difficulties that you can avoid dealing with, if your problem matches the reduction pattern.

To illustrate this, come up with a (simple) problem where you can use reduction and contrast this to one where you cannot.

Hint

What algebraic properties does the reduction operation need to exhibit?

Suggested Answer

As it is quasi-declarative, OpenMP has more freedom optimize the execution and use a more efficient approach to accumulate the result (recall the tree strategy mentioned in the lectures). Furthermore it avoids having to use a global counter that require some kind of manual synchronization.

Problem 3. from the previous exercise is a positive example. You cannot use reduction to divide the elements of a list in order of occurrence, since division is not associative.

OpenMP: Schedulers

As you might know, is_prime(n) is computationally expensive while is_coprime(n, m) is cheap. To determine if two positive integers n and m are coprime, you just have to check $GCD (n, m) = 1$ .

In the following program these functions are both used:

#define N 128

void
foo(int in[N], bool out[N])
{
  for (int i = 0; i < N; ++i) {
    if (i & 1) { /* i is odd */
      out[i] = is_prime(in[i]);
    } else {     /* i is even */
      out[i] = is_coprime(in[i], in[(i+1)%N]);
    }
  }
}

Explain what the general efficiency issue is when you attempt to execute the program in parallel using OpenMP. Which of the three schedulers mentioned in the lecture (static, dynamic and guided) should you use and why? Write out the OpenMP pragma you would use.

Hint

What do the respective schedulers do when a thread runs out of work?

Suggested Answer

The work load is not evenly balanced, so some kind of non-static scheduler is necessary to prevent work-starving. Whether to use dynamic or guided shouldn't matter, assuming that the effort to check for primality is much more expensive than

#pragma omp for schedule(dynamic)

Remember to submit your answers in groups of two via Brightspace! If you cannot find a group on your own, please reach out and we will try to pair you up.