Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<blockquote> <p>I'm smart enough to trust the compiler, so can anyone shed some light on which one is better?</p> </blockquote> <p>Yes, a little. Firstly, I'm taking this from the following message: <a href="https://groups.google.com/group/bsdnt-devel/msg/23a48bb18571b9a6">https://groups.google.com/group/bsdnt-devel/msg/23a48bb18571b9a6</a></p> <p>In this message a developer optimises some assembly I wrote very badly to run crazily fast in Intel Core 2 processors. As a background to this project, it's a bsd bignum library which I and a few other developers have been involved in. </p> <p>In this case, all that's being optimised is addition of two arrays that look like this: <code>uint64_t* x, uint64_t* y</code>. Each "limb" or member of the array represents part of the bignum; the basic process is to iterate over it starting from the least significant limb, add the pair up and continue upwards, passing the carry (any overflow) up each time. <code>adc</code> does this for you on a processor (it's not possible to access the carry flag from C I don't think).</p> <p>In that piece of code, a combination of <code>lea something, [something+1]</code> and <code>jrcxz</code> are used, which are apparently more efficient than the <code>jnz</code>/<code>add something, size</code> pair we might previously have used. I'm not sure if this was discovered as a result of simply testing different instructions, however. You'd have to ask.</p> <p>However, in a later message, it is measured on an AMD chip and does not perform so well.</p> <p>I'm also given to understand different operations perform differently on different processors. I know, for example, the GMP project detect processors using <code>cpuid</code> and pass in different assembly routines based on different architectures, e.g. <code>core2</code>, <code>nehalem</code>.</p> <p>The question you have to ask yourself is does your compiler produce optimised output for your cpu architecture? The Intel compiler, for example, is known to do this, so it might be worth measuring performance and seeing what output it produces.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload