Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>For the record, gcc, when compiling with optimization specifically disabled (<code>-O0</code>), produces different code for the two inputs (in my case, the body of <code>foo</code> was <code>return rand();</code> so that the result would not be determined at compile time).</p> <p>Without temporary variable <code>t</code>:</p> <pre><code> movl $0, %eax call foo testl %eax, %eax je .L4 /* inside of if block */ .L4: /* rest of main() */ </code></pre> <p>Here, the return value of <code>foo</code> is stored in the EAX register, and the register is tested against itself to see if it is 0, and if so, it jumps over the body of the if block.</p> <p>With temporary variable <code>t</code>:</p> <pre><code> movl $0, %eax call foo movl %eax, -4(%rbp) cmpl $0, -4(%rbp) je .L4 /* inside of if block */ .L4: /* rest of main() */ </code></pre> <p>Here, the return value of <code>foo</code> is stored in the EAX register, then pushed onto the stack. Then, the contents of the location on the stack are compared to literal 0, and if they are equal, it jumps over the body of the if block.</p> <p>And so if we assume further that the processor is not doing any "optimizations" when it generates the microcode for this, then the version without the temporary should be a few clock cycles faster. It's not going to be substantially faster because even though the version with a temporary involves a stack push, the stack value is almost certainly still going to be in the processor's L1 cache when the comparison instruction is executed immediately afterwords, and so there's not going to be a round trip to RAM.</p> <p>Of course the code becomes identical as soon as you turn on any optimization level, even <code>-O1</code>, and who compiles anything that is so critical that they care about a handful of clock cycles with all optimizations off?</p> <p><strong>Edit:</strong> With regard to your further information about your hardware engineer friend, I can't see how accessing a value in the L1 cache would ever be <em>faster</em> than accessing a register directly. I could see it being just about <em>as fast</em> if the value never even leaves the pipeline, but I can't see it being <em>faster</em>, especially since it still has to execute the <code>movl</code> instruction in addition to the comparison. But show him the assembly code above and ask what he thinks; it will be more productive than trying to discuss the problem in terms of C.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload