Note that there are some explanatory texts on larger screens.

plurals
  1. POWhen can I confidently compile program with -O3?
    text
    copied!<p>I've seen a lot of people complaining about -O3 option:</p> <p><a href="https://stackoverflow.com/questions/280069">GCC: program doesn't work with compilation option -O3</a></p> <p><a href="https://stackoverflow.com/a/14853616/1365960">Floating Point Problem provided by David Hammen</a></p> <p>I check the manual from the GCC:</p> <blockquote> <pre><code> -O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options. </code></pre> </blockquote> <p>And I've also confirmed the code to make sure that two options is the only two optimizations included with -O3 on:</p> <pre><code>if (optimize &gt;= 3){ flag_inline_functions = 1; flag_rename_registers = 1; } </code></pre> <p>For those two optimizations:</p> <p><strong>-finline-functions</strong> is useful in some cases (mainly with C++) because it lets us define the size of inlined functions (600 by default) with -finline-limit. Compiler may report an error complaining about lack of memory when set a high inline-limit. </p> <p><strong>-frename-registers</strong> attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers. </p> <p>For inline-functions, although it can reduce the numbers of function calls, but it may lead to a large binary files, so -finline-functions may introduce severe cache penalties and become even slower than -O2. I think the cache penalties not only depends on the program itself. </p> <p>For rename-registers, I don't think it will have any positive impact on a cisc architecture like x86.</p> <p>My question has 2.5 part: </p> <p>[Answerd]1. Am I right to claim that whether a program can run faster with -O3 option depends on the underlying platform/architecture?</p> <p>EDIT: The 1st part has been confirmed as true. David Hammen also claim that we should be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers like Intel and AMD.</p> <p>2.<strong>When can I confidently use -O3 option?</strong> I suppose these two optimizations especially the rename-registers may lead to a different behaviors from -O0/O2. I saw some programs compiled with -O3 got crashed during execution, is it deterministic? If I run an executable once without any crash, does it mean it is safe to use -O3?</p> <p>EDIT: The deterministicity has nothing to do with the optimization, it is a multithreading problem. However, for a multithread program, it is not safe to use -O3 when we run an executable once without errors. David Hammen shows that O3 optimization on floating point operations may violate the strict weak ordering criterion for a comparison. <strong>Is there any other concern we need to take care when we want to use -O3 option?</strong></p> <p>[Answered]3. If the answer of the 1st question is "yes", then when I change the target platform or in a distributed system with different machines, I may need to change between -O3 and -O2. Is there any general ways to decide whether I can get a performance improvement with -O3? For example, more registers, short inline functions, etc.</p> <p>EDIT: The 3rd part has been answered by Louen as "the variety of platforms make general reasoning about this problem impossible" When evaluating the performance gain by -O3, we have to try it with both and benchmark our code to see which is faster.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload