Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizing Bitwise Logic
    primarykey
    data
    text
    <p>In my code the following lines are currently the hotspot:</p> <pre><code>int table1[256] = /*...*/; int table2[512] = /*...*/; int table3[512] = /*...*/; int* result = /*...*/; for(int r = 0; r &lt; r_end; ++r) { std::uint64_t bits = bit_reader.value(); // 64 bits, no assumption regarding bits. // The get_ functions are table lookups from the highest word of the bits variable. struct entry { int sign_offset : 5; int r_offset : 4; int x : 7; }; // NOTE: We are only interested in the highest word in the bits variable. entry e; if(is_in_table1(bits)) // branch prediction should work well here since table1 will be hit more often than 2 or 3, and 2 more often than 3. e = reinterpret_cast&lt;const entry&amp;&gt;(table1[get_table1_index(bits)]); else if(is_in_table2(bits)) e = reinterpret_cast&lt;const entry&amp;&gt;(table2[get_table2_index(bits)]); else e = reinterpret_cast&lt;const entry&amp;&gt;(table3[get_table3_index(bits)]); r += e.r_offset; // r is 18 bits, top 14 bits are always 0. int x = e.x; // x is 14 bits, top 18 bits are always 0. int sign_offset = e.sign_offset; assert(sign_offset &lt;= 16 &amp;&amp; sign_offset &gt; 0); // The following is the hotspot. int sign = 1 - (bits &gt;&gt; (63 - sign_offset) &amp; 0x2); (*result++) = ((x &lt;&lt; 18) * sign) | r; // 32 bits // End of hotspot bit_reader.skip(sign_offset); // sign_offset is the last bit used. } </code></pre> <p>Though I haven't figured out how to further optimize this, maybe something from <a href="http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/intref_cls/common/intref_bk_avx2_manipulate.htm" rel="nofollow noreferrer">intrinsics for Operations at Bit-Granularity</a>, <a href="http://msdn.microsoft.com/en-US/library/szzkhewe%28v=vs.80%29" rel="nofollow noreferrer"><code>__shiftleft128</code></a> or <a href="http://msdn.microsoft.com/en-us/library/t5e2f3sc%28v=vs.80%29.aspx" rel="nofollow noreferrer"><code>_rot</code></a> could be useful?</p> <p>Note that I am also doing processing of the resulting data on the GPU, so the important thing is to get something into <code>result</code> which the GPU then can use to calculate the correct.</p> <p>Suggestions?</p> <p>EDIT:</p> <p>Added table look-up.</p> <p>EDIT:</p> <pre><code> int sign = 1 - (bits &gt;&gt; (63 - e.sign_offset) &amp; 0x2); 000000013FD6B893 and ecx,1Fh 000000013FD6B896 mov eax,3Fh 000000013FD6B89B sub eax,ecx 000000013FD6B89D movzx ecx,al 000000013FD6B8A0 shr r8,cl 000000013FD6B8A3 and r8d,2 000000013FD6B8A7 mov r14d,1 000000013FD6B8AD sub r14d,r8d </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload