StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Yes, you can count on compilers to do a good job at performing sub expression elimination, even through loops. This can result in a slight memory usage increase, however all of that will be considered by any decent compiler, and it's pretty much always the case that it's a win to perform sub expression elimination (since the memory we're talking about is registers and L1 cache).</p> <p>Here are some quick tests to "prove" it to yourself too. The results indicate that you should basically not try and outsmart the compiler doing manual sub expression elimination, just code naturally and let the compiler do what it's good at (which is stuff like figuring out which expressions should really be eliminated and which shouldn't given target architecture and surrounding code.)</p> <p>Later, if you are unsatisfied with the performance of your code, you should take a profiler to your code and see what statements and expressions are eating up the most time and then attempt to figure out whether or not you can reorganize the code to help the compiler out, but I'd say the vast majority of the time it wont be simple things like this, it'll be doing things to reduce cache stalls (ie organizing your data better), eliminating redundant inter-procedural calculations, and stuff like that.</p> <p>(FTR the use of randoms in the following code just ensures the compiler can't get too zealous about variable elimination and loop unrolling)</p> <p>prog1:</p> <pre><code>#include <stdlib.h> #include <time.h> int main () { srandom(time(NULL)); int i, ret = 0, a = random(), b = random(), values[10]; int loop_end = random() % 5 + 1000000000; for (i=0; i < 10; ++i) { values[i] = random(); } for (i = 0; i < loop_end; ++i) { ret += a * b * values[i % 10]; } return ret; } </code></pre> <p>prog2:</p> <pre><code>#include <stdlib.h> #include <time.h> int main () { srandom(time(NULL)); int i, ret = 0, a = random(), b = random(), values[10]; int loop_end = random() % 5 + 1000000000; for (i=0; i < 10; ++i) { values[i] = random(); } int c = a * b; for (i = 0; i < loop_end; ++i) { ret += c * values[i % 10]; } return ret; } </code></pre> <p>And here are the results:</p> <pre><code>> gcc -O2 prog1.c -o prog1; time ./prog1 ./prog1 1.62s user 0.00s system 99% cpu 1.630 total > gcc -O2 prog2.c -o prog2; time ./prog2 ./prog2 1.63s user 0.00s system 99% cpu 1.636 total </code></pre> <p>(This is measuring wall time, so don't pay attention to the 0.01 second difference, running it a few times they both range in the 1.62-1.63 second range, so they're the same speed)</p> <p>Interestingly enough, prog1 was faster when compiled without optimization:</p> <pre><code>> gcc -O0 prog1.c -o prog1; time ./prog1 ./prog1 2.83s user 0.00s system 99% cpu 2.846 total > gcc -O0 prog2.c -o prog2; time ./prog2 ./prog2 2.93s user 0.00s system 99% cpu 2.946 total </code></pre> <p>Also interesting, compiling with <code>-O1</code> provided the best performance..</p> <pre><code>gcc -O1 prog1.c -o prog1; time ./prog1 ./prog1 1.57s user 0.00s system 99% cpu 1.579 total gcc -O1 prog2.c -o prog2; time ./prog2 ./prog2 1.56s user 0.00s system 99% cpu 1.563 total </code></pre> <hr> <p>GCC and Intel are great compilers and are pretty smart about handling stuff like this. I don't have any experience with the Portland compiler, but these are pretty basic things for a compiler to do, so I'd be very surprised if it didn't handle situations like this well.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload