Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Very loosely speaking, it is not entirely unreasonable to say that a Haswell core has about 16 CUDA cores, but you definitely don't want to take that comparison too far. You may want to be cautious about making that statement directly in a presentation, but I've found it to be useful to think of a CUDA core as being somewhat related to a scalar FP unit.</p> <p>It may help if I explain why Haswell can perform 32 single-precision operations per cycle.</p> <ul> <li><p>8 single-precision operations execute in each AVX/AVX2 instruction. When writing code that will run on a Haswell CPU, you can use AVX and AVX2 instructions which operate on 256-bit vectors. These 256-bit vectors can represent 8 single-precision FP numbers, 8 integers (32-bit) or 4 double-precision FP numbers.</p></li> <li><p>2 AVX/AVX2 instructions can execute in each core per cycle, although there are some restrictions on which instructions can be paired up.</p></li> <li><p>A fused multiply add (FMA) instruction technically performs 2 single-precision operations. FMA instructions perform "fused" operations such as A = A * B + C, so there are arguably two operations per scalar operand: a multiplication and an addition.</p></li> </ul> <p>This article explains the above points in more detail: <a href="http://www.realworldtech.com/haswell-cpu/4/" rel="noreferrer">http://www.realworldtech.com/haswell-cpu/4/</a></p> <p>In the total accounting, a Haswell core can perform 8 * 2 * 2 single-precision operations per cycle. Since CUDA cores support FMA operations as well, you cannot count that factor of 2 when comparing CUDA cores to Haswell cores.</p> <p>A Kepler CUDA core has one single-precision floating-point unit, so it can perform one floating-point operation per cycle: <a href="http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf" rel="noreferrer">http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf</a>, <a href="http://www.realworldtech.com/kepler-brief/" rel="noreferrer">http://www.realworldtech.com/kepler-brief/</a></p> <p>If I was putting together slides on this, I would have one section explaining how many FP operations Haswell can do per cycle: the three points above, plus you have multiple cores and possibly multiple processors. And, I'd have another section explaining how many FP operations a Kepler GPU can do per cycle: 192 per SMX, and you have multiple SMX units on the GPU.</p> <p>PS.: I may be stating the obvious, but just to avoid confusion: the Haswell architecture also includes an integrated GPU, which has an altogether different architecture from the Haswell CPU.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload