StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>gprof is better than nothing. But not much. It estimates time spent not only in a function, but also in all of the functions called by the function - but beware it is an estimate, not a direct measurement. It does not make the distinction that some two callers of the same subroutine may have widely differing times spent inside it, per call. To do better than that you need a real call graph profiler, one that looks at several levels of the stack on a timer tick.</p> <p>dtrace is good for what it does.</p> <p>If you are doing low level performance optimization on x86, you should consider Intel's Vtune tool. Not only does it provide the best access I am aware of to low level performance measurement hardware on the chip, the so-called EMON (Event Monitoring) system (some of which I designed), but Vtune also has some pretty good high level tools. Such as call graph profiling that, I believe, is better than gprof. On the low level, I like doing things like generating profiles of the leading suspects, like branch mispredictions and cache misses, and looking at the code to see if there is something that can be done. Sometimes simple stuff, such as making an array size 255 rather than 256, helps a lot.</p> <p>Generic Linux oprofile, <a href="http://oprofile.sourceforge.net/about/" rel="nofollow">http://oprofile.sourceforge.net/about/</a>, is almost as good as Vtune, and better in some ways. And available for x86 and ARM. I haven't used it much, but I particularly like that you can use it in an almost completely non-intrusive manner, with no need to create the special -pg binary that gprof needs.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload