Note that there are some explanatory texts on larger screens.

plurals
  1. POMonitor non-heap memory usage of a JVM
    primarykey
    data
    text
    <p>We usually deal with OutOfMemoryError problems because of heap or permgen size configuration problem.</p> <p>But all the JVM memory is not permgen or heap. As far as I understand, it can also be related to Threads / Stacks, native JVM code...</p> <p>But using pmap I can see the process is allocated with 9.3G which is 3.3G off-heap memory usage.</p> <p>I wonder what are the possibilities to monitor and tune this extra off-heap memory consumption. </p> <p>I do not use direct off-heap memory access (MaxDirectMemorySize is 64m default)</p> <pre><code>Context: Load testing Application: Solr/Lucene server OS: Ubuntu Thread count: 700 Virtualization: vSphere (run by us, no external hosting) </code></pre> <p><strong>JVM</strong></p> <pre><code>java version "1.7.0_09" Java(TM) SE Runtime Environment (build 1.7.0_09-b05) Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode) </code></pre> <p><strong>Tunning</strong></p> <pre><code>-Xms=6g -Xms=6g -XX:MaxPermSize=128m -XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:+OptimizeStringConcat -XX:+UseCompressedStrings -XX:+UseStringCache </code></pre> <p><strong>Memory maps:</strong></p> <p><a href="https://gist.github.com/slorber/5629214" rel="nofollow noreferrer">https://gist.github.com/slorber/5629214</a></p> <p><strong>vmstat</strong></p> <pre><code>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 1743 381 4 1150 1 1 60 92 2 0 1 0 99 0 </code></pre> <p><strong>free</strong></p> <pre><code> total used free shared buffers cached Mem: 7986 7605 381 0 4 1150 -/+ buffers/cache: 6449 1536 Swap: 4091 1743 2348 </code></pre> <p><strong>Top</strong></p> <pre><code>top - 11:15:49 up 42 days, 1:34, 2 users, load average: 1.44, 2.11, 2.46 Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 98.9%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8178412k total, 7773356k used, 405056k free, 4200k buffers Swap: 4190204k total, 1796368k used, 2393836k free, 1179380k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17833 jmxtrans 20 0 2458m 145m 2488 S 1 1.8 206:56.06 java 1237 logstash 20 0 2503m 142m 2468 S 1 1.8 354:23.19 java 11348 tomcat 20 0 9184m 5.6g 2808 S 1 71.3 642:25.41 java 1 root 20 0 24324 1188 656 S 0 0.0 0:01.52 init 2 root 20 0 0 0 0 S 0 0.0 0:00.26 kthreadd ... </code></pre> <p><strong>df -> tmpfs</strong></p> <pre><code>Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 1635684 272 1635412 1% /run </code></pre> <hr> <p>The main problem we have:</p> <ul> <li>The server has 8G of physical memory</li> <li>The heap of Solr takes only 6G</li> <li>There is 1.5G of swap</li> <li>Swappiness=0</li> <li>The heap consumption seems appropriately tunned</li> <li>Running on the server: only Solr and some monitoring stuff</li> <li>We have a correct average response time</li> <li>We sometimes have anormaly long pauses, up to 20 seconds</li> </ul> <p>I guess the pauses could be a full GC on a swapped heap right?</p> <p><strong>Why is there so much swap?</strong></p> <p>I don't even really know if this is the JVM that makes the server swap or if it is something hidden that I can't see. Perhaps the OS page cache? But not sure why the OS would create page cache entries if that creates swap.</p> <p>I am considering testing the <code>mlockall</code> trick used in some popular Java based storage/NoSQL like ElasticSearch, Voldemort or Cassandra: check <a href="https://stackoverflow.com/questions/16689912/make-jvm-solr-not-swap-using-mlockall">Make JVM/Solr not swap, using mlockall</a></p> <hr> <p><strong>Edit:</strong></p> <p>Here you can see max heap, used heap (blue), a used swap (red). It seems kind of related.</p> <p><img src="https://i.stack.imgur.com/YFoWi.png" alt="Swap and Heap"></p> <p>I can see with Graphite that there are many ParNew GC occuring regularly. And there are a few CMS GC that correspond to the heap signifiant decreases of the picture.</p> <p>The pauses doesn't seem to be correlated with the heap decreases but are regularly distributed between 10:00 and 11:30, so it may be related to the ParNew GC I guess.</p> <p>During the load test I can see some disc activity and also some swap IO activity which is really calm when the test ends.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload