StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POMemory settings with thousands of threads
primarykey
Id
14545924
data
AcceptedAnswerId
0
AnswerCount
2
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2013-01-27T09:02:26.927
FavoriteCount
4
LastActivityDate
2013-01-27T16:43:42.687
LastEditDate
2013-01-27T16:43:42.687
LastEditorUserId
458370
OwnerUserId
458370
ParentId
0
PostTypeId
1
Score
10
ViewCount
5098
LastEditorDisplayName
text
Body
I'm playing around with the JVM (Oracle 1.7 64 bit) on a Linux box (AMD 6 Core, 16 GB RAM) to see how the number of threads in an application affects performance. I'm hoping to measure at which point context switching degrades performance. I have created a little application that creates a thread execution pool: <pre><code>Executors.newFixedThreadPool(numThreads) </code></pre> I adjust <code>numThreads</code> everytime I run the program, to see the effect it has. I then submit <code>numThread</code> jobs (instances of <code>java.util.concurrent.Callable</code>) to the pool. Each one increments an <code>AtomicInteger</code>, does some work (creates an array of random integers and shuffles it), and then sleeps a while. The idea is to simulate a web service call. Finally, the job resubmits itself to the pool, so that I always have <code>numThreads</code> jobs working. I am measuring the throughput, as in the number of jobs that are processed per minute. With several thousand threads, I can process up to 400,000 jobs a minute. Above 8000 threads, the results start to vary a lot, suggesting that context switching is becoming a problem. But I can continue to increase the number of threads to 30,000 and still get higher throughput (between 420,000 and 570,000 jobs per minute). Now the question: I get a <code>java.lang.OutOfMemoryError: Unable to create new native thread</code> with more than about 31,000 jobs. I have tried setting <code>-Xmx6000M</code> which doesn't help. I tried playing with <code>-Xss</code> but that doesn't help either. I've read that <code>ulimit</code> can be useful, but increasing with <code>ulimit -u 64000</code> didn't change anything. For info: <pre><code>[root@apollo ant]# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127557 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited </code></pre> So the question #1: What do I have to do to be able to create a bigger thread pool? Question #2: At what stage should I expect to see context switching really reducing throughput and causing the process to grind to a halt? <hr> Here are some results, after I modified it to do a little more processing (as was suggested) and started recording average response times (as was also suggested). <pre><code>// ( (n_cores x t_request) / (t_request - t_wait) ) + 1 // 300 ms wait, 10ms work, roughly 310ms per job => ideal response time, 310ms // ideal num threads = 1860 / 10 + 1 = 187 threads // // results: // // 100 => 19,000 thruput, 312ms response, cpu < 50% // 150 => 28,500 thruput, 314ms response, cpu 50% // 180 => 34,000 thruput, 318ms response, cpu 60% // 190 => 35,800 thruput, 317ms response, cpu 65% // 200 => 37,800 thruput, 319ms response, cpu 70% // 230 => 42,900 thruput, 321ms response, cpu 80% // 270 => 50,000 thruput, 324ms response, cpu 80% // 350 => 64,000 thruput, 329ms response, cpu 90% // 400 => 72,000 thruput, 335ms response, cpu >90% // 500 => 87,500 thruput, 343ms response, cpu >95% // 700 => 100,000 thruput, 430ms response, cpu >99% // 1000 => 100,000 thruput, 600ms response, cpu >99% // 2000 => 105,000 thruput, 1100ms response, cpu >99% // 5000 => 131,000 thruput, 1600ms response, cpu >99% // 10000 => 131,000 thruput, 2700ms response, cpu >99%, 16GB Virtual size // 20000 => 140,000 thruput, 4000ms response, cpu >99%, 27GB Virtual size // 30000 => 133,000 thruput, 2800ms response, cpu >99%, 37GB Virtual size // 40000 => - thruput, -ms response, cpu >99%, >39GB Virtual size => java.lang.OutOfMemoryError: unable to create new native thread </code></pre> I interpret them as: 1) Even though the application is sleeping for 96.7% of the time, that still leaves lots of thread switching to be done 2) Context switching is measurable, and is shown in the response time. What is interesting here is that When tuning an app, you'd might choose the acceptable response time, say 400ms, and increase number of threads until you get that response time, which in this case would let the app process around 95 thousand requests a minute. Often people say that the ideal number of threads is near the number of cores. In apps that have wait time (blocked threads, say waiting for a database or web service to respond), the calculation needs to consider that (see my equation above). But even that theoretical ideal isn't an actual ideal, when you look at the results or when you tune to a specific response time.
Tags
<java><multithreading><memory>
Title
Memory settings with thousands of threads
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USAnt Kutschera
UserOwnerUserId
1. USAnt Kutschera
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POMemory settings with thousands of threads
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POMemory settings with thousands of threads
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POMemory settings with thousands of threads
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.