StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POJava CPU-intensive application stalls/hangs when increasing no. of workers. Where is the bottleneck, and how to deduce/ monitor it on a Ubuntu server?
primarykey
Id
1951546
data
AcceptedAnswerId
0
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2009-12-23T09:08:58.630
FavoriteCount
0
LastActivityDate
2016-03-07T20:45:44.360
LastEditDate
2016-03-07T20:43:55.330
LastEditorUserId
63550
OwnerUserId
209591
ParentId
0
PostTypeId
1
Score
0
ViewCount
758
LastEditorDisplayName
text
Body
I'm running a nightly CPU-intensive Java-application on an Ec2-server (c1.xlarge) which has eight cores, 7.5 GB RAM (running Linux / <a href="https://en.wikipedia.org/wiki/List_of_Ubuntu_releases#Ubuntu_9.10_.28Karmic_Koala.29" rel="nofollow noreferrer">Ubuntu 9.10</a> (Karmic Koala) 64 bit). The application is architected in such a way that a variable number of workers are constructed (each in their own thread) and fetch messages from a queue to process them. Throughput is the main concern here and performance is measured in processed messages / second. The application is NOT RAM-bound... And as far as I can see not I/O-bound. (although I'm not a star in Linux. I'm using dstat to check for I/O-load which are pretty low and CPU wait-signals (which are almost non-existent)). I'm seeing the following when spawning a different number of workers (worker-threads). <ol> <li>Worker: throughput 1.3 messages / sec / worker</li> <li>worker: ~ throughput 0.8 messages / sec / worker</li> <li>worker: ~ throughput 0.5 messages / sec / worker</li> <li>worker: ~ throughput 0.05 messages / sec / worker</li> </ol> I was expecting a near-linear increase in throughput, but reality proves otherwise. Three questions: <ol> <li>What might be causing the sub-linear performance going from one worker --> two workers and two workers --> three workers?</li> <li>What might be causing the (almost) complete halt when going from three workers to four workers? It looks like a kind of deadlock-situation or something.. (can this happen due to heavy context-switching?)</li> <li>How would I start measuring where the problems occur? My development-box has two CPUs and is running under windows. I normally attach a GUI-profiler and check for threading-issues. But the problem only really starts to manifest itself my more than two threads.</li> </ol> Some more background information: <ul> <li>Workers are spawned using a Executors.newScheduledThreadPool</li> <li>A workers-thread does calculations based on the message (CPU-intensive). Each worker-thread contains a separate persistQueue used for offloading writing to disk (and thus make use of CPU / I/O concurrency.) persistQueue = new ThreadPoolExecutor(1, 1, 100, TimeUnit.MILLISECONDS, new ArrayBlockingQueue(maxAsyncQueueSize), new ThreadPoolExecutor.AbortPolicy());</li> </ul> The flow (per worker) goes like this: <ol> <li>The worker-thread puts the result of a message in the persistQueue and gets on with processing the next message.</li> <li>The ThreadpoolExecutor (of which we have one per worker-thread) only contains one thread which processes all incoming data (waiting in the persistQueue ) and writes it to disk (<a href="https://en.wikipedia.org/wiki/Berkeley_DB" rel="nofollow noreferrer">Berkeley DB</a> + Apache <a href="http://en.wikipedia.org/wiki/Lucene" rel="nofollow noreferrer">Lucene</a>).</li> <li>The idea is that 1. and 2. can run concurrent for the most part since 1. is CPU-heavy and 2. is I/O-heavy.</li> <li>It's possible that persistQueue becomes full. This is done because otherwise a slow I/O-system might cause flooding of the queues, and result in <a href="https://en.wikipedia.org/wiki/Out_of_memory" rel="nofollow noreferrer">OOM</a>-errors (yes, it's a lot of data). In that case the workerThread pauses until it can write its content to persistQueue. A full queue hasn't happened yet on this setup (which is another reason I think the application is definitely not I/O-bound).</li> </ol> The last information: <ul> <li>Workers are isolated from the others concerning their data, except: <ul> <li>They share some heavily used static final maps (used as caches. The maps are memory-intensive, so I can't keep them local to a worker even if I wanted to). Operations that workers perform on these caches are: iterations, lookups, contains (no writes, deletes, etc.)</li> <li>These shared maps are accessed without synchronization (no need. right?)</li> <li>Workers populate their local data by selecting data from MySQL (based on keys in the received message). So this is a potential bottleneck. However, most of the data are reads, queried tables are optimized with indexes and again not I/O-bound.</li> <li>I have to admit that I haven't done much MySQL-server optimizing yet (in terms of <code>config -params</code>), but I just don't think that is the problem.</li> </ul></li> <li>Output is written to: <ul> <li>Berkeley DB (using memcached(b)-client). All workers share one server.</li> <li>Lucene (using a home-grown low-level indexer). Each workers has a separate indexer.</li> </ul></li> <li>Even when disabling output writing, the problems occur.</li> </ul> This is a huge post, I realize that, but I hope you can give me some pointers as to what this might be, or how to start monitoring / deducing where the problem lies.
Tags
<java><linux><performance><multithreading><monitoring>
Title
Java CPU-intensive application stalls/hangs when increasing no. of workers. Where is the bottleneck, and how to deduce/ monitor it on a Ubuntu server?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USPeter Mortensen
UserOwnerUserId
1. USGeert-Jan
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.