Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat mechanisms other than mutexs or garbage collection can slow my multi-threaded java program?
    primarykey
    data
    text
    <p><strong>Problem</strong></p> <p>I have a piece of java code (JDK 1.6.0._22 if relevant) that implements a stateless, side effect free function with no mutexes. It does however use a lot of memory (I don't know if that is relevant).</p> <p>In the past I have visited Sun Laboratories and gathered the standard "performance vs number of threads" curve. As this function has no mutexs, it has a nice graph although the garbage collection kicked in as the number of threads increased. After some garbage collection tuning I was able to make this curve almost flat.</p> <p>I am now doing the same experiment on Intel hardware. The hardware has 4 CPUs each with 8 cores, and hyperthreading. This gives 64 availableProcessors(). Unfortunately the curve of "performance vs number of threads" scales nicely for 1, 2, 3 threads, and caps at 3 threads. After 3 threads I can put as many threads as I want to the task, and the performance gets no better</p> <p><strong>Attempts to fix the Problem</strong></p> <p>My first thought was that I had been stupid and introduced some synchronised code somewhere. Normally to resolve this issue I run JConsole or JVisualVM, and look at the thread stacktraces. If I have 64 threads running at the speed of 3, I would expect 61 of them to be sitting waiting to enter a mutex. I didn't find this. Instead I found all the threads running: just very slowly.</p> <p>A second thought was that perhaps the timing framework was introducing problems. I replaced my function with a dummy function that just counts to a billion using an AtomicLong. This scaled beautifully with number of threads: I was able to count to a billion 10,000 times 64 times quicker with with 64 threads than with 1 thread.</p> <p>I thought (desperation kicking in) perhaps garbage collection is taking a really really long time, so I tweaked the garbage collection parameters. While this improved my latency variation, it had no effect on throughput: I still have 64 threads running at the speed I expect 3 to run at.</p> <p>I have downloaded the intel tool VTunes, but my skill with it is weak: it is a complex tool and I don't understand it yet. I have the instruction book on order: a fun Christmas present to myself, but that is a little too late to help my current problem</p> <p><strong>Question</strong></p> <ol> <li>What tools (mental or software) could I use to improve my understanding of what is going on? </li> <li>What mechanisms other than mutexs or garbage collection could be slowing my code down?</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload