StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Rolling your own very <strong><em>simple</em></strong> profiler is not that hard. Insert into main():</p> <pre><code>int main() { profileCpuUsage(1); // start timer #1 well_written_function(); profileCpuUsage(2); // stop timer #1, and start timer #2 badly_written_function(); profileCpuUsage(-1); // print stats for timers #1 and #2 return 0; } </code></pre> <p>where:</p> <pre><code>#define NUMBER(a) ((int)(sizeof(a) / sizeof(a)[0])) void profileCpuUsage(int slice) { static struct { int iterations; double elapsedTime; } slices[30]; // 0 is a don't care slice if (slice < 0) { // -1 = print if (slices[0].iterations) for (slice = 1; slice < NUMBER(slices); slice++) printf("Slice %2d Iterations %7d Seconds %7.3f\n", slice, slices[slice].iterations, slices[slice].elapsedTime); } else { static int i; // = previous slice static double t; // = previous t1 const double t1 = realElapsedTime(); // see below for definition assert (slice < NUMBER(slices)); slices[i].iterations += 1; slices[i].elapsedTime += t1 - t; // i = 0 first time through i = slice; t = t1; } } </code></pre> <p>Now admittedly in your simple example using this profileCpuUsage() doesn't add much benefit. And it has disadvantage of requiring you to <em>manually</em> instrument your code by calling profileCpuUsage() at suitable locations. </p> <p>But advantages include:</p> <ul> <li>You can time <em>any</em> code fragment, not just procedures.</li> <li>It is quick to add and delete, as you do a binary search to find and/or remove code hotspots.</li> <li>It focuses only on the code you are interested in.</li> <li>Portable!</li> <li>KISS</li> </ul> <p>One tricky non-portable thing is to define the function realElapsedTime() so that it provides enough granularity to get valid times. This generally works for me (using the Windows API under CYGWIN):</p> <pre><code>#include <windows.h> double realElapsedTime(void) // <-- granularity about 50 microsec on test machines { static LARGE_INTEGER freq, start; LARGE_INTEGER count; if (!QueryPerformanceCounter(&count)) assert(0 && "QueryPerformanceCounter"); if (!freq.QuadPart) { // one time initialization if (!QueryPerformanceFrequency(&freq)) assert(0 && "QueryPerformanceFrequency"); start = count; } return (double)(count.QuadPart - start.QuadPart) / freq.QuadPart; } </code></pre> <p>For straight Unix there is the common:</p> <pre><code>double realElapsedTime(void) // returns 0 first time called { static struct timeval t0; struct timeval tv; gettimeofday(&tv, 0); if (!t0.tv_sec) t0 = tv; return tv.tv_sec - t0.tv_sec + (tv.tv_usec - t0.tv_usec) / 1000000.; } </code></pre> <p>realElapsedTime() gives wall-clock time, not process time, which is usually what I want. </p> <p>There are also other less-portable methods to achieve finer granularity using RDTSC; see for example <a href="http://en.wikipedia.org/wiki/Time_Stamp_Counter" rel="nofollow noreferrer">http://en.wikipedia.org/wiki/Time_Stamp_Counter</a>, and its links, but I haven't tried these.</p> <p><strong><em>Edit:</em></strong> ravenspoint's very nice answer seems to be not too dissimilar from mine. <strong><em>And</em></strong> his answer uses nice descriptive strings, rather than just ugly numbers, which I was often frustrated with. But this can be fixed with only about a dozen extra lines (but this almost <em>doubles</em> the line count!). </p> <p>Note that we want to avoid any usage of malloc(), and I'm even a bit dubious about strcmp(). So the number of slices is never increased. And hash collisions are simply flagged it rather being resolved: the human profiler can fix this by manually bumping up the number of slices from 30, or by changing the description. <strong><em>Untested</em></strong></p> <pre><code>static unsigned gethash(const char *str) // "djb2", for example { unsigned c, hash = 5381; while ((c = *str++)) hash = ((hash << 5) + hash) + c; // hash * 33 + c return hash; } void profileCpuUsage(const char *description) { static struct { int iterations; double elapsedTime; char description[20]; // added! } slices[30]; if (!description) { // print stats, but using description, mostly unchanged... } else { const int slice = gethash(description) % NUMBER(slices); if (!slices[slice].description[0]) { // if new slice assert(strlen(description) < sizeof slices[slice].description); strcpy(slices[slice].description, description); } else if (!!strcmp(slices[slice].description, description)) { strcpy(slices[slice].description, "!!hash conflict!!"); } // remainder unchanged... } } </code></pre> <p>And another point is that typically you'll want to disable this profiling for release versions; this also applies to ravenspoint's answer. This can be done by the trick of using an evil macro to define it away:</p> <pre><code>#define profileCpuUsage(foo) // = nothing </code></pre> <p>If this is done, you will of course need to add parentheses to the definition to disable the disabling macro:</p> <pre><code>void (profileCpuUsage)(const char *description)... </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload