Note that there are some explanatory texts on larger screens.

plurals
  1. POCode runs faster using placement new
    primarykey
    data
    text
    <p>I got this class,</p> <p><strong>Approach 1:</strong></p> <pre><code>typedef float v4sf __attribute__ (vector_size(16)) class Unit { public: Unit(int num) { u = new float[num]; v = new float[num]; } void update() { for(int i =0 ; i &lt; num; i+=4) { *(v4sf*)&amp;u[i] = *(v4sf*)&amp;v[i] + *(v4sf*)&amp;t[i]; //many other equations } } float*u,*v,*t; //and many other variables } </code></pre> <p><strong>Approach 2:</strong></p> <p>Same as approach 1. Except that in approach 2, <code>v</code>,<code>u</code>, and all other variables are allocated on a big chunk pre-allocated on heap, using placement <code>new</code>.</p> <pre><code>typedef float v4sf __attribute__ (vector_size(16)) class Unit { public: Unit(int num) { buffer = new char[num*sizeof(*u) + sizeof(*v) /*..and so on for other variables..*/] u = new(buffer) float[num]; v = new(buffer+sizeof(float)*num) float[num]; //And so on for other variables } void update() { for(int i =0 ; i &lt; num; i+=4) { *(v4sf*)&amp;u[i] = *(v4sf*)&amp;v[i] + *(v4sf*)&amp;t[i]; //many other equations } } char* buffer; float*u,*v,*t; //and many other variables } </code></pre> <p>However, approach 2 is 2x faster. Why is that?</p> <p>There are around 12 float variables and num is 500K. update() is called <code>1k</code> times. The speed doesnt factor in the memory allocation. I measure the speed like this:</p> <pre><code>double start = getTime(); for( int i = 0; i &lt; 1000; i++) { unit-&gt;update(); } double end = getTime(); cout&lt;&lt;end - start; </code></pre> <p>And this is around 2x faster in approach 2. </p> <p>Compiler options: <code>gcc -msse4 -o3 -ftree-vectorize.</code></p> <p>L1 cache is 256K, Ram is 8GB, pagesize is 4K. </p> <p>Edit: Corrected the mistake in allocating the variables in approach 2. All variables are allocated in different sections, correctly. Processor is Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz</p> <p>Edit: added the source here - <a href="https://gist.github.com/11e94d3fba7e7f9ea2a4" rel="nofollow">Source</a>. Approach 1) gives 69.58s , Approach 2) gives 46.74s. Though not 2x faster, it is still fast. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload