Note that there are some explanatory texts on larger screens.

plurals
  1. POPerformance loss from parallelization
    primarykey
    data
    text
    <p>I've modified a raytracer I wrote a while ago for educational purposes to take advantage of multiprocessing using OpenMP. However, I'm not seeing any profit from the parallelization.</p> <p>I've tried 3 different approaches: a task-pooled environment (the <code>draw_pooled()</code> function), a standard OMP parallel nested <code>for</code> loop with image row-level parallelism (<code>draw_parallel_for()</code>), and another OMP parallel <code>for</code> with pixel-level parallelism (<code>draw_parallel_for2()</code>). The original, serial drawing routine is also included for reference (<code>draw_serial()</code>).</p> <p>I'm running a 2560x1920 render on an Intel Core 2 Duo E6750 (2 cores @ 2,67GHz each w/Hyper-Threading) and 4GB of RAM under Linux, binary compiled by gcc with libgomp. The scene takes an average of:</p> <ul> <li>120 seconds to render in series,</li> <li>but 196 seconds (<strong>sic!</strong>) to do so in parallel in 2 threads (the default - number of CPU cores), regardless of which of the three particular methods above I choose,</li> <li>if I override OMP's default thread number with 4 to take HT into account, the parallel render times drop to 177 seconds.</li> </ul> <p>Why is this happening? I can't see any obvious bottlenecks in the parallel code.</p> <p><strong>EDIT:</strong> Just to clarify - the task pool is <strong>only one of the implementations</strong>, please do read the question - scroll down to see the parallel <code>for</code>s. Thing is, they are just as slow as the task pool!</p> <pre><code>void draw_parallel_for(int w, int h, const char *fname) { unsigned char *buf; buf = new unsigned char[w * h * 3]; Scene::GetInstance().PrepareRender(w, h); for (int y = 0; y &lt; h; ++y) { #pragma omp parallel for num_threads(4) for (int x = 0; x &lt; w; ++x) Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3); } write_png(buf, w, h, fname); delete [] buf; } void draw_parallel_for2(int w, int h, const char *fname) { unsigned char *buf; buf = new unsigned char[w * h * 3]; Scene::GetInstance().PrepareRender(w, h); int x, y; #pragma omp parallel for private(x, y) num_threads(4) for (int xy = 0; xy &lt; w * h; ++xy) { x = xy % w; y = xy / w; Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3); } write_png(buf, w, h, fname); delete [] buf; } void draw_parallel_for3(int w, int h, const char *fname) { unsigned char *buf; buf = new unsigned char[w * h * 3]; Scene::GetInstance().PrepareRender(w, h); #pragma omp parallel for num_threads(4) for (int y = 0; y &lt; h; ++y) { for (int x = 0; x &lt; w; ++x) Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3); } write_png(buf, w, h, fname); delete [] buf; } void draw_serial(int w, int h, const char *fname) { unsigned char *buf; buf = new unsigned char[w * h * 3]; Scene::GetInstance().PrepareRender(w, h); for (int y = 0; y &lt; h; ++y) { for (int x = 0; x &lt; w; ++x) Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3); } write_png(buf, w, h, fname); delete [] buf; } std::queue&lt; std::pair&lt;int, int&gt; * &gt; task_queue; void draw_pooled(int w, int h, const char *fname) { unsigned char *buf; buf = new unsigned char[w * h * 3]; Scene::GetInstance().PrepareRender(w, h); bool tasks_issued = false; #pragma omp parallel shared(buf, tasks_issued, w, h) num_threads(4) { #pragma omp master { for (int y = 0; y &lt; h; ++y) { for (int x = 0; x &lt; w; ++x) task_queue.push(new std::pair&lt;int, int&gt;(x, y)); } tasks_issued = true; } while (true) { std::pair&lt;int, int&gt; *coords; #pragma omp critical(task_fetch) { if (task_queue.size() &gt; 0) { coords = task_queue.front(); task_queue.pop(); } else coords = NULL; } if (coords != NULL) { Scene::GetInstance().RenderPixel(coords-&gt;first, coords-&gt;second, buf + (coords-&gt;second * w + coords-&gt;first) * 3); delete coords; } else { #pragma omp flush(tasks_issued) if (tasks_issued) break; } } } write_png(buf, w, h, fname); delete [] buf; } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload