Note that there are some explanatory texts on larger screens.

plurals
  1. POBranch Prediction: Writing Code to Understand it; Getting Weird Results
    text
    copied!<p>I'm trying to get a good understanding of branch prediction by measuring the time to run loops with predictable branches vs. loops with random branches.</p> <p>So I wrote a program that takes large arrays of 0's and 1's arranged in different orders (i.e. all 0's, repeating 0-1, all rand), and iterates through the array branching based on if the current index is 0 or 1, doing time-wasting work.</p> <p>I expected that harder-to-guess arrays would take longer to run on, since the branch predictor would guess wrong more often, and that the time-delta between runs on two sets of arrays would remain the same regardless of the amount of time-wasting work.</p> <p>However, as amount of time-wasting work increased, the difference in time-to-run between arrays increased, A LOT.</p> <p><img src="https://i.imgur.com/Ph1jD.png" alt="Yo this graph makes no sense"> </p> <p>(X-axis is amount of time-wasting work, Y-axis is time-to-run)</p> <p>Does anyone understand this behavior? You can see the code I'm running at the following code:</p> <pre><code>#include &lt;stdlib.h&gt; #include &lt;time.h&gt; #include &lt;chrono&gt; #include &lt;stdio.h&gt; #include &lt;iostream&gt; #include &lt;vector&gt; using namespace std; static const int s_iArrayLen = 999999; static const int s_iMaxPipelineLen = 60; static const int s_iNumTrials = 10; int doWorkAndReturnMicrosecondsElapsed(int* vals, int pipelineLen){ int* zeroNums = new int[pipelineLen]; int* oneNums = new int[pipelineLen]; for(int i = 0; i &lt; pipelineLen; ++i) zeroNums[i] = oneNums[i] = 0; chrono::time_point&lt;chrono::system_clock&gt; start, end; start = chrono::system_clock::now(); for(int i = 0; i &lt; s_iArrayLen; ++i){ if(vals[i] == 0){ for(int i = 0; i &lt; pipelineLen; ++i) ++zeroNums[i]; } else{ for(int i = 0; i &lt; pipelineLen; ++i) ++oneNums[i]; } } end = chrono::system_clock::now(); int elapsedMicroseconds = (int)chrono::duration_cast&lt;chrono::microseconds&gt;(end-start).count(); //This should never fire, it just exists to guarantee the compiler doesn't compile out our zeroNums/oneNums for(int i = 0; i &lt; pipelineLen - 1; ++i) if(zeroNums[i] != zeroNums[i+1] || oneNums[i] != oneNums[i+1]) return -1; delete[] zeroNums; delete[] oneNums; return elapsedMicroseconds; } struct TestMethod{ string name; void (*func)(int, int&amp;); int* results; TestMethod(string _name, void (*_func)(int, int&amp;)) { name = _name; func = _func; results = new int[s_iMaxPipelineLen]; } }; int main(){ srand( (unsigned int)time(nullptr) ); vector&lt;TestMethod&gt; testMethods; testMethods.push_back(TestMethod("all-zero", [](int index, int&amp; out) { out = 0; } )); testMethods.push_back(TestMethod("repeat-0-1", [](int index, int&amp; out) { out = index % 2; } )); testMethods.push_back(TestMethod("repeat-0-0-0-1", [](int index, int&amp; out) { out = (index % 4 == 0) ? 0 : 1; } )); testMethods.push_back(TestMethod("rand", [](int index, int&amp; out) { out = rand() % 2; } )); int* vals = new int[s_iArrayLen]; for(int currentPipelineLen = 0; currentPipelineLen &lt; s_iMaxPipelineLen; ++currentPipelineLen){ for(int currentMethod = 0; currentMethod &lt; (int)testMethods.size(); ++currentMethod){ int resultsSum = 0; for(int trialNum = 0; trialNum &lt; s_iNumTrials; ++trialNum){ //Generate a new array... for(int i = 0; i &lt; s_iArrayLen; ++i) testMethods[currentMethod].func(i, vals[i]); //And record how long it takes resultsSum += doWorkAndReturnMicrosecondsElapsed(vals, currentPipelineLen); } testMethods[currentMethod].results[currentPipelineLen] = (resultsSum / s_iNumTrials); } } cout &lt;&lt; "\t"; for(int i = 0; i &lt; s_iMaxPipelineLen; ++i){ cout &lt;&lt; i &lt;&lt; "\t"; } cout &lt;&lt; "\n"; for (int i = 0; i &lt; (int)testMethods.size(); ++i){ cout &lt;&lt; testMethods[i].name.c_str() &lt;&lt; "\t"; for(int j = 0; j &lt; s_iMaxPipelineLen; ++j){ cout &lt;&lt; testMethods[i].results[j] &lt;&lt; "\t"; } cout &lt;&lt; "\n"; } int end; cin &gt;&gt; end; delete[] vals; } </code></pre> <p>Pastebin link: <a href="http://pastebin.com/F0JAu3uw" rel="nofollow noreferrer">http://pastebin.com/F0JAu3uw</a></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload