StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWhy does Java switch on contiguous ints appear to run faster with added cases?
text
Body
copied!<p>I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying <code>double</code> variables by <code>10</code> raised to arbitrary non-negative <code>int</code> <code>exponent</code>s. One fast way (edit: but not the fastest possible, see Update 2 below) to get the multiplied value is to <code>switch</code> on the <code>exponent</code>:</p> <pre><code>double multiplyByPowerOfTen(final double d, final int exponent) { switch (exponent) { case 0: return d; case 1: return d*10; case 2: return d*100; // ... same pattern case 9: return d*1000000000; case 10: return d*10000000000L; // ... same pattern with long literals case 18: return d*1000000000000000000L; default: throw new ParseException("Unhandled power of ten " + power, 0); } } </code></pre> <p>The commented ellipses above indicate that the <code>case</code> <code>int</code> constants continue incrementing by 1, so there are really 19 <code>case</code>s in the above code snippet. Since I wasn't sure whether I would actually need all the powers of 10 in <code>case</code> statements <code>10</code> thru <code>18</code>, I ran some microbenchmarks comparing the time to complete 10 million operations with this <code>switch</code> statement versus a <code>switch</code> with only <code>case</code>s <code>0</code> thru <code>9</code> (with the <code>exponent</code> limited to 9 or less to avoid breaking the pared-down <code>switch</code>). I got the rather surprising (to me, at least!) result that the longer <code>switch</code> with more <code>case</code> statements actually ran faster.</p> <p>On a lark, I tried adding even more <code>case</code>s which just returned dummy values, and found that I could get the switch to run even faster with around 22-27 declared <code>case</code>s (even though those dummy cases are never actually hit while the code is running). (Again, <code>case</code>s were added in a contiguous fashion by incrementing the prior <code>case</code> constant by <code>1</code>.) These execution time differences are not very significant: for a random <code>exponent</code> between <code>0</code> and <code>10</code>, the dummy padded <code>switch</code> statement finishes 10 million executions in 1.49 secs versus 1.54 secs for the unpadded version, for a grand total savings of 5ns per execution. So, not the kind of thing that makes obsessing over padding out a <code>switch</code> statement worth the effort from an optimization standpoint. But I still just find it curious and counter-intuitive that a <code>switch</code> doesn't become slower (or perhaps at best maintain constant <em>O(1)</em> time) to execute as more <code>case</code>s are added to it. </p> <p><img src="https://i.stack.imgur.com/kbnan.png" alt="switch benchmarking results"></p> <p>These are the results I obtained from running with various limits on the randomly-generated <code>exponent</code> values. I didn't include the results all the way down to <code>1</code> for the <code>exponent</code> limit, but the general shape of the curve remains the same, with a ridge around the 12-17 case mark, and a valley between 18-28. All tests were run in JUnitBenchmarks using shared containers for the random values to ensure identical testing inputs. I also ran the tests both in order from longest <code>switch</code> statement to shortest, and vice-versa, to try and eliminate the possibility of ordering-related test problems. I've put my testing code up on a github repo if anyone wants to try to reproduce these results.</p> <p>So, what's going on here? Some vagaries of my architecture or micro-benchmark construction? Or is the Java <code>switch</code> really a little faster to execute in the <code>18</code> to <code>28</code> <code>case</code> range than it is from <code>11</code> up to <code>17</code>?</p> <p><a href="https://github.com/abissell/switch-experiment" rel="noreferrer">github test repo "switch-experiment"</a></p> <p><strong>UPDATE:</strong> I cleaned up the benchmarking library quite a bit and added a text file in /results with some output across a wider range of possible <code>exponent</code> values. I also added an option in the testing code not to throw an <code>Exception</code> from <code>default</code>, but this doesn't appear to affect the results.</p> <p><strong>UPDATE 2:</strong> Found some pretty good discussion of this issue from back in 2009 on the xkcd forum here: <a href="http://forums.xkcd.com/viewtopic.php?f=11&t=33524" rel="noreferrer">http://forums.xkcd.com/viewtopic.php?f=11&t=33524</a>. The OP's discussion of using <code>Array.binarySearch()</code> gave me the idea for a simple array-based implementation of the exponentiation pattern above. There's no need for the binary search since I know what the entries in the <code>array</code> are. It appears to run about 3 times faster than using <code>switch</code>, obviously at the expense of some of the control flow that <code>switch</code> affords. That code has been added to the github repo also.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload