Note that there are some explanatory texts on larger screens.

plurals
  1. POEmulating variable bit-shift using only constant shifts?
    primarykey
    data
    text
    <p>I'm trying to find a way to perform an indirect shift-left/right operation without actually using the variable shift op or any branches. </p> <p>The particular PowerPC processor I'm working on has the quirk that a shift-by-constant-immediate, like </p> <pre><code>int ShiftByConstant( int x ) { return x &lt;&lt; 3 ; } </code></pre> <p>is fast, single-op, and superscalar, whereas a shift-by-variable, like</p> <pre><code>int ShiftByVar( int x, int y ) { return x &lt;&lt; y ; } </code></pre> <p>is a <a href="http://www.cellperformance.com/articles/2006/04/avoiding_microcoded_instructio.html" rel="noreferrer">microcoded operation that takes 7-11 cycles to execute while the entire rest of the pipeline stops dead</a>.</p> <p>What I'd like to do is figure out which non-microcoded integer PPC ops the <a href="http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/sraw.htm" rel="noreferrer">sraw</a> decodes into and then issue them individually. This won't help with the latency of the <a href="http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.aixassem/doc/alangref/sraw.htm" rel="noreferrer">sraw</a> itself &mdash; it'll replace one op with six &mdash; but in between those six ops I can dual-dispatch some work to the other execution units and get a net gain. </p> <p>I can't seem to find anywhere what &mu;ops sraw decodes into &mdash; does anyone know how I can replace a variable bit-shift with a sequence of constant shifts and basic integer operations? (A for loop or a switch or anything with a branch in it won't work because the branch penalty is even bigger than the microcode penalty.)</p> <p>This needn't be answered in assembly; I'm hoping to learn the algorithm rather than the particular code, so an answer in C or a highlevel language or even pseudocode would be perfectly helpful.</p> <p><b>edit:</b> A couple of clarifications that I should add:</p> <ol> <li>I'm not even a little bit worried about portability </li> <li><p>PPC has a conditional-move, so we can assume the existence of a branchless intrinsic function </p> <p>int isel(a, b, c) { return a >= 0 ? b : c; }</p> <p>(if you write out a ternary that does the same thing I'll get what you mean) </p></li> <li>integer multiply is also microcoded and even slower than sraw. :-(</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload