Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizations by compiler in a recursive program
    text
    copied!<p>I got motivated from tail call optimization question <a href="https://stackoverflow.com/questions/310974/what-is-tail-call-optimization">What Is Tail Call Optimization?</a></p> <p>So, I decided to see how can I do it in plain C.</p> <p>So, I wrote the 2 factorial programs, 1st where tail call optimization can be applied. I call this fact function as fact(n, 1).</p> <pre><code>unsigned long long int fact(int n, int cont) { if(n == 0) return cont; else return fact(n-1, n * cont); } </code></pre> <p>2nd is the normal recursion where multiple stack frames are required.</p> <pre><code>unsigned long long int fact(int n) { if(n == 0) return 1; else return n * fact(n-1); } </code></pre> <p>This is the assembly generated by a 32 bit compiler for the former with -O2</p> <pre><code>0x8048470 &lt;fact&gt;: push %ebp 0x8048471 &lt;fact+1&gt;: mov %esp,%ebp 0x8048473 &lt;fact+3&gt;: mov 0x8(%ebp),%edx 0x8048476 &lt;fact+6&gt;: mov 0xc(%ebp),%eax 0x8048479 &lt;fact+9&gt;: test %edx,%edx 0x804847b &lt;fact+11&gt;: je 0x8048488 &lt;fact+24&gt; 0x804847d &lt;fact+13&gt;: lea 0x0(%esi),%esi 0x8048480 &lt;fact+16&gt;: imul %edx,%eax 0x8048483 &lt;fact+19&gt;: sub $0x1,%edx 0x8048486 &lt;fact+22&gt;: jne 0x8048480 &lt;fact+16&gt; 0x8048488 &lt;fact+24&gt;: mov %eax,%edx 0x804848a &lt;fact+26&gt;: sar $0x1f,%edx 0x804848d &lt;fact+29&gt;: pop %ebp 0x804848e &lt;fact+30&gt;: ret </code></pre> <p>This is the assembly generated by a 32 bit compiler for latter with -O2.</p> <pre><code>0x8048470 &lt;fact&gt;: push %ebp 0x8048471 &lt;fact+1&gt;: mov %esp,%ebp 0x8048473 &lt;fact+3&gt;: push %edi 0x8048474 &lt;fact+4&gt;: push %esi 0x8048475 &lt;fact+5&gt;: push %ebx 0x8048476 &lt;fact+6&gt;: sub $0x14,%esp 0x8048479 &lt;fact+9&gt;: mov 0x8(%ebp),%eax 0x804847c &lt;fact+12&gt;: movl $0x1,-0x18(%ebp) 0x8048483 &lt;fact+19&gt;: movl $0x0,-0x14(%ebp) 0x804848a &lt;fact+26&gt;: test %eax,%eax 0x804848c &lt;fact+28&gt;: je 0x80484fc &lt;fact+140&gt; 0x804848e &lt;fact+30&gt;: mov %eax,%ecx 0x8048490 &lt;fact+32&gt;: mov %eax,%esi 0x8048492 &lt;fact+34&gt;: sar $0x1f,%ecx 0x8048495 &lt;fact+37&gt;: add $0xffffffff,%esi 0x8048498 &lt;fact+40&gt;: mov %ecx,%edi 0x804849a &lt;fact+42&gt;: mov %eax,%edx 0x804849c &lt;fact+44&gt;: adc $0xffffffff,%edi 0x804849f &lt;fact+47&gt;: sub $0x1,%eax 0x80484a2 &lt;fact+50&gt;: mov %eax,-0x18(%ebp) 0x80484a5 &lt;fact+53&gt;: movl $0x0,-0x14(%ebp) 0x80484ac &lt;fact+60&gt;: sub -0x18(%ebp),%esi 0x80484af &lt;fact+63&gt;: mov %edx,-0x20(%ebp) 0x80484b2 &lt;fact+66&gt;: sbb -0x14(%ebp),%edi 0x80484b5 &lt;fact+69&gt;: movl $0x1,-0x18(%ebp) 0x80484bc &lt;fact+76&gt;: movl $0x0,-0x14(%ebp) 0x80484c3 &lt;fact+83&gt;: mov %ecx,-0x1c(%ebp) 0x80484c6 &lt;fact+86&gt;: xchg %ax,%ax 0x80484c8 &lt;fact+88&gt;: mov -0x14(%ebp),%ecx 0x80484cb &lt;fact+91&gt;: mov -0x18(%ebp),%ebx 0x80484ce &lt;fact+94&gt;: imul -0x1c(%ebp),%ebx 0x80484d2 &lt;fact+98&gt;: imul -0x20(%ebp),%ecx 0x80484d6 &lt;fact+102&gt;: mov -0x18(%ebp),%eax 0x80484d9 &lt;fact+105&gt;: mull -0x20(%ebp) 0x80484dc &lt;fact+108&gt;: add %ebx,%ecx 0x80484de &lt;fact+110&gt;: add %ecx,%edx 0x80484e0 &lt;fact+112&gt;: addl $0xffffffff,-0x20(%ebp) 0x80484e4 &lt;fact+116&gt;: adcl $0xffffffff,-0x1c(%ebp) 0x80484e8 &lt;fact+120&gt;: mov -0x1c(%ebp),%ebx 0x80484eb &lt;fact+123&gt;: mov %eax,-0x18(%ebp) 0x80484ee &lt;fact+126&gt;: mov -0x20(%ebp),%eax 0x80484f1 &lt;fact+129&gt;: mov %edx,-0x14(%ebp) 0x80484f4 &lt;fact+132&gt;: xor %edi,%ebx 0x80484f6 &lt;fact+134&gt;: xor %esi,%eax 0x80484f8 &lt;fact+136&gt;: or %eax,%ebx 0x80484fa &lt;fact+138&gt;: jne 0x80484c8 &lt;fact+88&gt; 0x80484fc &lt;fact+140&gt;: mov -0x18(%ebp),%eax 0x80484ff &lt;fact+143&gt;: mov -0x14(%ebp),%edx 0x8048502 &lt;fact+146&gt;: add $0x14,%esp 0x8048505 &lt;fact+149&gt;: pop %ebx 0x8048506 &lt;fact+150&gt;: pop %esi 0x8048507 &lt;fact+151&gt;: pop %edi 0x8048508 &lt;fact+152&gt;: pop %ebp 0x8048509 &lt;fact+153&gt;: ret </code></pre> <p>Compiling both the programs and looking at the assembly generated, both the programs still have recursive calls. But, when I compile with -O2 option(assembly posted above) in the former, I see no recursive calls at all and so I think gcc does tail call optimization stuff.</p> <p>But when I compile the latter with -O2 option, it also removes the recursive calls and instead puts quite a large number of assembly instructions as compared to what happens to former on -O2.</p> <p>I wanted to understand precisely what is the compiler doing in the latter, and why couldnt it transform to the assembly generated by former even with O4.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload