Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>On my laptop with 2 physical cores and 4 logical cores, I get:</p> <pre><code>&lt;br&gt; Starting 1 threads... Elapsed time: 11.638 seconds&lt;br&gt; Starting 2 threads... Elapsed time: 12.418 seconds&lt;br&gt; Starting 3 threads... Elapsed time: 13.556 seconds&lt;br&gt; Starting 4 threads... Elapsed time: 14.929 seconds&lt;br&gt; Starting 5 threads... Elapsed time: 20.811 seconds&lt;br&gt; Starting 6 threads... Elapsed time: 22.776 seconds&lt;br&gt; Starting 7 threads... Elapsed time: 27.160 seconds&lt;br&gt; Starting 8 threads... Elapsed time: 30.249 seconds&lt;br&gt; </code></pre> <p>Which shows degradation as soon as we have more than 1 thread.</p> <p>I suspect the reason is that function makework() is doing memory accesses. You can see this in Visual Studio 2010 by setting a breakpoint on the 1st line of _tmain(). When you hit the breakpoint, press Ctrl-Alt-D to see the disassembly window. Anywhere you see a register name in brackets (e.g. [esp] ), it is a memory access. The on-CPU level 1 memory cache bandwidth is saturating. You can test this theory with a modified makework();</p> <pre><code> void makework(void *jnk) { double tmp = 0; volatile double *p; int i; int j; p=(double*)jnk; for(j=0; j&lt;100000000; j++) { for(i=0; i&lt;100; i++) { tmp = tmp+(double)i*(double)i; } *p=tmp; } *p = tmp; _endthread(); } </code></pre> <p>It does the same number of computations, but with an extra memory write thrown in every 100 iterations. On my laptop, the results are:</p> <pre><code>Starting 1 threads... Elapsed time: 11.684 seconds&lt;br&gt; Starting 2 threads... Elapsed time: 13.760 seconds&lt;br&gt; Starting 3 threads... Elapsed time: 14.445 seconds&lt;br&gt; Starting 4 threads... Elapsed time: 17.519 seconds&lt;br&gt; Starting 5 threads... Elapsed time: 23.369 seconds&lt;br&gt; Starting 6 threads... Elapsed time: 25.491 seconds&lt;br&gt; Starting 7 threads... Elapsed time: 30.155 seconds&lt;br&gt; Starting 8 threads... Elapsed time: 34.460 seconds&lt;br&gt; </code></pre> <p>Which shows the impact memory access can have on the results. I tried various VS2010 compiler settings to see if I could get makework() to have no memory accesses, but no luck. To truly study the raw CPU core performance vs # of active threads, I suspect we'd have to code a makework() in assembler.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload