Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I have two important points as two why your results are not linear. The first one is about Intel hyper-threading and AMD modules. The next one is about turbo frequency modes with Intel and AMD</p> <p><strong>1.) Hyper-threading and AMD modules/cores</strong></p> <p>Too many people confuse Intel Hyper threading and AMD cores in modules as real cores and expect a linear speed up. An Intel processor with hyper-threading can run twice as many hyper-threads/hardware threads as cores. AMD also has it's own technology where the fundamental unit is called a module and each module has what AMD disingenuously calls a core <a href="http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2" rel="nofollow">What's a module, what's a core</a>. One reason this is easily confused is that for example with Task Mangager in windows with hyper-treading it shows the number of hardware threads but it says CPUs. This is misleading as it's not the number of CPU cores.</p> <p>I don't have enough knowledge of AMD to go into details but as far as I understand each module has one floating point unit (but two integer units). Therefore, you can't really expect a linear speed up beyond the number of Intel cores or AMD modules for floating point operations.</p> <p>In your case the Opteron 6348 has 2 dies per processor each with 3 modules which each as 2 "cores". Though this gives 12 cores there are really only 6 floating point units.</p> <p>I ran your code on my single socket Intel Xeon E5-1620 @ 3.6 GHz. This has 4 cores and hyper-threading (so eight hardware threads). I get:</p> <pre><code>1 threads: 156s 4 threads: 37s (156/4 = 39s) 8 threads: 30s (156/8 = 19.5s) </code></pre> <p>Notice that for 4 threads the scaling is almost linear but for 8 threads the hyper-threading only helps a little (at least it helps). Another strange observation is that my single threaded results are much lower than yours (MSVC2013 64bit release mode). I would expect a faster single threaded ivy bridge core to easily trump a slower AMD pile driver core. This does not make sense to me.</p> <p><strong>2.) Intel Turbo Boost and AMD Turbo Core.</strong></p> <p>Intel has a technology called Turbo Boost which changes the clock frequency based on the number of threads that are running. When all threads are being run the turbo boost is at it's lowest value. On Linux the only application I know that can measure this when an operation is running is powertop. Getting the real operating frequency is not something so easy to measure (for one it needs root access). On Windows you can use CPUz. In any case the result is that you can't expect linear scaling when only running one thread compared to running the maximum number of real cores.</p> <p>Once again, I have little experience with AMD processors but as far as I can tell their technology is called Turbo Core and I expect the effect to be similar. This is the reason that a good benchmark disables turbo frequency modes (in the BIOS if you can) when comparing threaded code.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload