Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><strong>EDIT :</strong> Now that some code has been added.</p> <p>In that particular example, there is very little computation and lots of memory access. So the performance will depend heavily on:</p> <ul> <li>The size of the vector.</li> <li>How you are timing it. (do you have an outer-loop for timing purposes)</li> <li>Whether the data is already in cache.</li> </ul> <p>For larger vector sizes, you will likely find that the performance is limited by your memory bandwidth. In which case, parallelism is not going to help much. For smaller sizes, the overhead of threading will dominate. If you're getting the "expected" speedup, you're probably somewhere in between where the result is optimal.</p> <p>I refuse to give hard numbers because in general, "guessing" performance, especially in multi-threaded applications is a lost cause unless you have prior testing knowledge or intimate knowledge of both the program and the system that it's running on.</p> <p>Just as a simple example taken from my answer here: <a href="https://stackoverflow.com/q/9244481/922184">How to get 100% CPU usage from a C program</a></p> <p>On a Core i7 920 @ 3.5 GHz (4 cores, 8 threads):</p> <p>If I run with <strong>4 threads</strong>, the result is:</p> <pre><code>This machine calculated all 78498 prime numbers under 1000000 in 39.3498 seconds </code></pre> <p>If I run with <strong>4 threads</strong> and explicitly (using Task Manager) <strong>pin the threads on 4 distinct physical cores</strong>, the result is:</p> <pre><code>This machine calculated all 78498 prime numbers under 1000000 in 30.4429 seconds </code></pre> <hr> <p>So this shows how unpredictable it is for even a very simple and embarrassingly parallel application. Applications involving heavy memory usage and synchronization get a lot uglier...</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload