Note that there are some explanatory texts on larger screens.

plurals
  1. POBitwise permutation of multiple 64bit values in parallel / combined
    text
    copied!<p><strong>This question is NOT about "How do i bitwise permutation" We now how to do that, what we are looking for is a faster way with less cpu instructions, inspired by the bitslice implementation of sboxes in DES</strong></p> <p>To speed up some cipher code we want to reduce the amount of permutation calls. The main cipher functions do multiple bitwise permutations based on lookup arrays. As the permutation operations are only bitshifts, </p> <p>Our basic idea is to take multiple input values, that need the same permutation, and shift them in parallel. For example, if input bit 1 must be moved to output bit 6.</p> <p>Is there any way to do this? We have no example code right now, because there is absolutly no idea how to accomplish this in a performant way.</p> <p>The maximum value size we have on our plattforms are 128bit, the longest input value is 64bit.Therefore the code must be faster, then doing the whole permutation 128 times.</p> <p><strong>EDIT</strong></p> <p>Here is a simple 8bit example of a permutation</p> <pre><code>+---+---+---+---+---+---+---+---+ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | &lt;= Bits +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | &lt;= Input +---+---+---+---+---+---+---+---+ | 3 | 8 | 6 | 2 | 5 | 1 | 4 | 7 | &lt;= Output +---+---+---+---+---+---+---+---+ </code></pre> <p>The cipher makes usage of multiple input keys. It's a block cipher, so the same pattern must be applied to all 64bit blocks of the input.</p> <p>As the permutations are the same for each input block, we want to process multiple input blocks in one <em>step</em> / to combine the operations for multiple input sequences. Instead of moving 128times one bit per call, moving 1 time 128bit at once.</p> <p><strong>EDIT2</strong></p> <p>We could NOT use threads, as we have to run the code on embedded systems without threading support. Therefore we also have no access on external libraries and we have to keep it plain C.</p> <p><strong>SOLUTION</strong></p> <p>After testing and playing with the given answers we have done it the following way:</p> <ul> <li>We are putting the single bits of 128 64bit values on a uint128_t[64]* array.</li> <li>For permutation we have just to copy pointers</li> <li>After all is done, we revert the first operation and get 128 permuted values back</li> </ul> <p>Yeah, it is realy that simple. We was testing this way early in the project, but it was too slow. It seems we had a bug in the testcode.</p> <p>Thank you all, for the hints and the patience.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload