Note that there are some explanatory texts on larger screens.

plurals
  1. POParallel application has random behavior
    primarykey
    data
    text
    <p>I am writing a C program using pthreads to do a wavefront pattern computation on a bidimensional matrix. To achieve good performance, I distribute several rows to each thread in an interleaved manner, like so:</p> <p>thread 0 ------------------</p> <p>thread 1 ------------------</p> <p>thread 2 ------------------</p> <p>thread 3 ------------------</p> <p>thread 0 ------------------</p> <p>thread 1 ------------------</p> <p>thread 2 ------------------</p> <p>thread 3 ------------------</p> <p>etc.</p> <p>In this computation I think this is the only viable split, since each thread needs the new values computed on each row and cannot move further until they are available. Now, the catch here is that thread 1 needs the values computed by thread 0 on its row, so it has to trail behind thread 0 and not go ahead of it. For this purpose, I am splitting each row into chunks and protecting each chunk with a critical section. The computation goes like:</p> <p>thread 0 -----------</p> <p>thread 1 ------</p> <p>thread 2 ---</p> <p>such that always row i has to trail behind row i - 1. I hope you understand the idea. I have implemented this idea and I'm testing it on a machine which is a dual quad-core system. I am experiencing strange behavior. The results are calculated correctly, but the running time varies anywhere between 8x less than the sequential time and up to more than the sequential time. Practically, the sequential time for a 12000 x 12000 matrix is 16 seconds, and the parallel running time is anywhere between 2 and 17 seconds and often differs on two consecutive runs.</p> <p>My initial idea was that this problem is very locality sensitive, so of course I could be getting bad performance if let's say thread 0 and thread 1 are scheduled on different physical processors. Looking around in /proc/cpuinfo, I have deduced that the cores are mapped such that 0, 2, 4, 6 are on processor 0 and 1, 3, 5, 7 are on processor 1. Then, upon thread creation, I have used pthread_setaffinity_np to set the affinity for the threads on the right cores. However, nothing changes. I have also tried using pthread_attr_setaffinity_np and also sched_setaffinity but I get the same random running time.</p> <p>Either the kernel is ignoring my affinity calls or this is not the problem. I really hope someone can help me as I have run out of ideas. Thank you.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload