StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p><strong>Welcome to the world of <a href="http://en.wikipedia.org/wiki/Denormal_number" rel="noreferrer">denormalized floating-point</a>!</strong> They can wreak havoc on performance!!!</p> <p>Denormal (or subnormal) numbers are kind of a hack to get some extra values very close to zero out of the floating point representation. Operations on denormalized floating-point can be <strong><em>tens to hundreds of times slower</em></strong> than on normalized floating-point. This is because many processors can't handle them directly and must trap and resolve them using microcode.</p> <p>If you print out the numbers after 10,000 iterations, you will see that they have converged to different values depending on whether <code>0</code> or <code>0.1</code> is used.</p> <p>Here's the test code compiled on x64:</p> <pre><code>int main() { double start = omp_get_wtime(); const float x[16]={1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6}; const float z[16]={1.123,1.234,1.345,156.467,1.578,1.689,1.790,1.812,1.923,2.034,2.145,2.256,2.367,2.478,2.589,2.690}; float y[16]; for(int i=0;i<16;i++) { y[i]=x[i]; } for(int j=0;j<9000000;j++) { for(int i=0;i<16;i++) { y[i]*=x[i]; y[i]/=z[i]; #ifdef FLOATING y[i]=y[i]+0.1f; y[i]=y[i]-0.1f; #else y[i]=y[i]+0; y[i]=y[i]-0; #endif if (j > 10000) cout << y[i] << " "; } if (j > 10000) cout << endl; } double end = omp_get_wtime(); cout << end - start << endl; system("pause"); return 0; } </code></pre> <p><strong>Output:</strong></p> <pre><code>#define FLOATING 1.78814e-007 1.3411e-007 1.04308e-007 0 7.45058e-008 6.70552e-008 6.70552e-008 5.58794e-007 3.05474e-007 2.16067e-007 1.71363e-007 1.49012e-007 1.2666e-007 1.11759e-007 1.04308e-007 1.04308e-007 1.78814e-007 1.3411e-007 1.04308e-007 0 7.45058e-008 6.70552e-008 6.70552e-008 5.58794e-007 3.05474e-007 2.16067e-007 1.71363e-007 1.49012e-007 1.2666e-007 1.11759e-007 1.04308e-007 1.04308e-007 //#define FLOATING 6.30584e-044 3.92364e-044 3.08286e-044 0 1.82169e-044 1.54143e-044 2.10195e-044 2.46842e-029 7.56701e-044 4.06377e-044 3.92364e-044 3.22299e-044 3.08286e-044 2.66247e-044 2.66247e-044 2.24208e-044 6.30584e-044 3.92364e-044 3.08286e-044 0 1.82169e-044 1.54143e-044 2.10195e-044 2.45208e-029 7.56701e-044 4.06377e-044 3.92364e-044 3.22299e-044 3.08286e-044 2.66247e-044 2.66247e-044 2.24208e-044 </code></pre> <p>Note how in the second run the numbers are very close to zero.</p> <p>Denormalized numbers are generally rare and thus most processors don't try to handle them efficiently.</p> <hr> <p>To demonstrate that this has everything to do with denormalized numbers, if we <strong>flush denormals to zero</strong> by adding this to the start of the code:</p> <pre><code>_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); </code></pre> <p>Then the version with <code>0</code> is no longer 10x slower and actually becomes faster. (This requires that the code be compiled with SSE enabled.)</p> <p>This means that rather than using these weird lower precision almost-zero values, we just round to zero instead.</p> <p><strong>Timings: Core i7 920 @ 3.5 GHz:</strong></p> <pre><code>// Don't flush denormals to zero. 0.1f: 0.564067 0 : 26.7669 // Flush denormals to zero. 0.1f: 0.587117 0 : 0.341406 </code></pre> <p>In the end, this really has nothing to do with whether it's an integer or floating-point. The <code>0</code> or <code>0.1f</code> is converted/stored into a register outside of both loops. So that has no effect on performance.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload