Note that there are some explanatory texts on larger screens.

plurals
  1. POScala parallel collection runtime puzzling
    text
    copied!<p><strong>Edit: My sample size was too small. When I ran it against the real data on 8 CPU's, I saw a 7.2x speed increase. Not too shabby for adding 4 characters to my code ;)</strong></p> <p>I am currently in the process of trying to 'sell' management on the benefits of using Scala, especially when it comes to scaling with CPU's. To that end, I created a simple test application that does a bunch of vector math and was a bit surprised to find that the runtime was not noticably better on my quad-core machine. Interestingly enough, I found that the runtime is the worst the first time you go through the collection and gets better with subsequent calls. Are there some lazy things in the parallel collection that are causing this, or am I just doing this wrong? It should be noted that I come from the C++/C# world, so it's entirely possible that I have messed up my configuration somehow. Regardless, here's my setup:</p> <p>InteliJ Scala Plugin</p> <p>Scala 2.9.1.final</p> <p>Windows 7 64 bit, Quad-Core Processor (no hyperthreading)</p> <pre><code>import util.Random // simple Vector3D class that has final x,y,z components a length, and a '-' function class Vector3D(val x:Double, val y:Double, val z:Double) { def length = math.sqrt(x*x+y*y+z*z) def -(rhs : Vector3D ) = new Vector3D(x - rhs.x, y - rhs.y, z - rhs.z) } object MainClass { def main(args : Array[String]) = { println("Available CPU's: " + Runtime.getRuntime.availableProcessors()) println("Parallelism Degree set to: " + collection.parallel.ForkJoinTasks.defaultForkJoinPool.getParallelism); // my position val myPos = new Vector3D(0,0,0); val r = new Random(0); // define a function nextRand that gets us a random between 0 and 100 def nextRand = r.nextDouble() * 100; // make 10 million random targets val targets = (0 until 10000000).map(_ =&gt; new Vector3D(nextRand, nextRand, nextRand)).toArray // take the .par hit before we start profiling val parTargets = targets.par println("Created " + targets.length + " vectors") // define a range function val rangeFunc : (Vector3D =&gt; Double) = (targetPos) =&gt; (targetPos - myPos).length // we'll select ones that are &lt;50 val within50 : (Vector3D =&gt; Boolean) = (targetPos) =&gt; rangeFunc(targetPos) &lt; 50 // time it sequentially val startTime_sequential = System.currentTimeMillis() val numTargetsInRange_sequential = targets.filter(within50) val endTime_sequential = System.currentTimeMillis() println("Sequential (ms): " + (endTime_sequential - startTime_sequential)) // do the parallel version 10 times for(i &lt;- 1 to 10) { val startTime_par = System.currentTimeMillis() val numTargetsInRange_parallel = parTargets.filter(within50) val endTime_par = System.currentTimeMillis() val ms = endTime_par - startTime_par; println("Iteration[" + i + "] Executed in " + ms + " ms") } } } </code></pre> <p>The output of this program is:</p> <pre><code>Available CPU's: 4 Parallelism Degree set to: 4 Created 10000000 vectors Sequential (ms): 216 Iteration[1] Executed in 227 ms Iteration[2] Executed in 253 ms Iteration[3] Executed in 76 ms Iteration[4] Executed in 78 ms Iteration[5] Executed in 77 ms Iteration[6] Executed in 80 ms Iteration[7] Executed in 78 ms Iteration[8] Executed in 78 ms Iteration[9] Executed in 79 ms Iteration[10] Executed in 82 ms </code></pre> <p>So what's going on here? The first 2 times we do the filter, it's slower, and then things speed up? I understand that there will inherently be a parallelism startup cost, I'm just trying to figure out where it makes sense to express the parallelism in my applicaion, and specifically I want to be able to show Management a program that runs 3-4 times faster on a Quad core box. Is this just not a good problem?</p> <p>Ideas?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload