StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POData structure for efficient function call matching
text
Body
copied!<p>I'm building a tool that among other things has to measure performance-related impact of changes in our product. </p> <p>To get that done, I've implemented a profiler that traces whenever a function is called or returns and notifies me about that. First, I've dumped the output to a file to get a sense of the data I'll be working with and here is more-or-less how they look like:</p> <pre><code>FuncCall1 FuncCall2 FuncCall3 FuncRet3 FuncCall4 FuncRet4 FuncCall5 FuncCall6 FuncRet6 FuncRet5 FuncRet2 FuncRet1 </code></pre> <p>To have a better visual understanding of how this data looks like, here is a graph of the first 10000 function calls: (x-axis: time, y-axis: depth/nesting): <a href="http://img444.imageshack.us/img444/4710/proflog.gif" rel="nofollow noreferrer">Function Call/Return graph http://img444.imageshack.us/img444/4710/proflog.gif</a> (<a href="http://img444.imageshack.us/img444/4710/proflog.gif" rel="nofollow noreferrer">http://img444.imageshack.us/img444/4710/proflog.gif</a>)</p> <p>When a function begins execution, I will log it's name/identifier and the current high-precision timestamp and when it returns I will need to lookup the entry where I stored the start time and add a new timestamp that marks it return.</p> <p>To sum things up, the operations that I'm going to perform on these data are:</p> <ol> <li>Insert a new function call mark with current timestamp.</li> <li>Lookup the most recent function call of a certain ID and store the return timestamp.</li> <li>See which other functions were called from within a certain function (and see where its spending its time) - for example If I'm looking at Function#2 in the previous example, I want to know that it calls Function#3, Function#4, Function#5 and Function#5 calls Function#6 then it returns (with all call/return timestamps marked).</li> </ol> <p>Now, I have several ideas for data-structures that might be good for this scenario:</p> <ol> <li><p>An auto-balanced tree (i.e AVL) where the key for each node will be the function identifier and the value in each node would be a stack of timestamp pairs. This approach will give me fast insertion and lookup when marking function timestamps and the fact that each node is a stack, it will also take care of matching the right return timestamp to the start timestamp - Always (I assume) the latest return timestamp of a certain function should match the most nested/recent function call. In this approach, maintaining nested function calls with different identifiers would be a bit troublesome, because I will have to traverse the tree and match them basing on their timestamp to figure out their nesting - not ideal.</p></li> <li><p>Maintain a list of functions that did not return yet (that will preserve the call-stack info) and use skip-list where each level would be equal to function-call-nesting level. This approach would make operation #3 easier, but lookups will be slower and I might have to maintain very long lists of not returned functions - such as main(), that will have to be maintained throughout the lifetime of my application. Here I could also use a hashtable, to improve the lookup speed sacrificing some more memory usage. Memory usage is critical - this profiler easily generates about 20 MB / s.</p></li> </ol> <p>The reason why I'm not using a plain-simple stack to track these data, is because I will need to periodically sync partial results to a different machines and have at least, partial-results available before everything returns.</p> <p>I've looked over interval-trees, range-trees and other kind of data-structures that I'm aware of, but I can't find any that would meet all my 3 requirements efficiently.</p> <p>Maybe there is a data structure that would meet them all that I'm not aware of? Any Ideas?</p> <h2>Update:</h2> <p>What about this:</p> <p>Having a tree that would have function calls along with their nested calls and a separate stack for the functions that did not return. </p> <p>Now every element on the stack will have a pointer to it's copy in the tree and when a new function call arrives, I will look at the top element on the stack, trace it's pointer to it's representation in the tree, add the new function call as a child to that call and push it's copy on the stack with a pointer to the newly created tree node.</p> <p>For function returns, it's similar, for every function return, the latest entry on the stack will always be it's call - trace the call pointer, save the return time in the tree and pop the call.</p> <p>Do you see any major flaws in my thinking?</p> <h2>Update 2:</h2> <p>My Approach worked perfectly. I will wait 2 days and answer my question.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload