Note that there are some explanatory texts on larger screens.

plurals
  1. POStrange GC behaviour of Scala actors application
    primarykey
    data
    text
    <p>I have <a href="https://stackoverflow.com/questions/2179830/experiences-with-escape-analysis-enabled-on-the-jvm">an application which uses rather a lot of actors</a>: 25,000 to be precise. It uses <a href="http://en.wikipedia.org/wiki/Scala_%28programming_language%29" rel="nofollow noreferrer">Scala</a> 2.7.7 and is running on <strong>jdk6_u18</strong>. It basically listens to and processes market data and has very little state.</p> <p>It starts at 8.02am every day and within the hour it has crashed with an <code>OutOfMemoryError</code>. "Aha" you say, "you have a memory leak!" Except that when I restart it, it <em>never</em>, <strong>ever</strong> crashes again for the rest of the day! This despite ramping up both GC and CPU overhead when the US markets open at 2.30pm.</p> <p>A few anecdotal findings:</p> <ul> <li>it runs on Solaris. When I used to run it on Linux, it <em>never</em> crashed <em>at all</em>.</li> <li>I have tried messing with generation heap sizing, allocating more memory etc. I think It makes no difference</li> <li>The behaviour of the collector seems to differ when I have <code>verbose:gc</code> switched on</li> </ul> <p>A few questions present themselves:</p> <ol> <li>Why would the behaviour of this program be different between Linux and Solaris? </li> <li>Why would the behaviour be different between starting it at 8.02 and starting it at 8.42?</li> <li>I have heard that the actors library had some memory-leak issues. What were they, when were they fixed and how would I discover if anything "similar" might be going on here? (Things to look for in <a href="http://java.sun.com/javase/6/docs/technotes/tools/share/jhat.html" rel="nofollow noreferrer">jhat</a> etc)</li> <li>Does anyone have an idea as to what might be going on? </li> </ol> <p>I am now trying <code>G1</code> to see if that makes any difference. I will update this question tomorrow! </p> <h2>Some output from G1 with verbose:gc on</h2> <p>I think I just caught it in the act:</p> <blockquote> <p>600.290: [Full GC 255M->144M(256M), 1.5772616 secs]<br> 602.084: [GC pause (young) 227M->145M(256M), 0.0556769 secs]<br> 602.418: [Full GC 254M->144M(256M), 1.6415216 secs]<br> 604.279: [GC pause (young) 227M->145M(256M), 0.0415157 secs]<br> 604.602: [Full GC 255M->145M(256M), 1.6041762 secs]<br> 606.422: [GC pause (young) 227M->145M(256M), 0.0237441 secs]<br> 606.710: [Full GC 254M->145M(256M), 1.6022185 secs] </p> </blockquote> <p>And then a bit later (you can see the full GC has taken longer and is reclaiming less)</p> <blockquote> <p>849.084: [Full GC 254M->176M(256M), 1.9658754 secs]<br> 851.191: [GC pause (young) 228M->176M(256M), 0.0218611 secs]<br> 851.414: [Full GC 254M->176M(256M), 1.9352357 secs]<br> 853.492: [GC pause (young) 228M->176M(256M), 0.0224688 secs]<br> 853.716: [Full GC 254M->176M(256M), 1.9339705 secs]<br> 855.793: [GC pause (young) 228M->176M(256M), 0.0215707 secs]<br> 856.009: [Full GC 254M->176M(256M), 1.9805797 secs]<br> 858.137: [GC pause (young) 228M->176M(256M), 0.0223224 secs] </p> </blockquote> <h2>Some output from G1 with verbose:gc off</h2> <p>It's fine again! *Sigh*</p> <blockquote> <p>303.656: [GC pause (young) 225M->93M(256M), 0.1680767 secs]<br> 308.060: [GC pause (young) 226M->94M(256M), 0.1793724 secs]<br> 312.746: [GC pause (young) 227M->93M(256M), 0.1674851 secs]<br> 316.162: [GC pause (young) 227M->95M(256M), 0.1826145 secs]<br> 320.147: [GC pause (young) 226M->94M(256M), 0.1656664 secs]<br> 325.978: [GC pause (young) 226M->93M(256M), 0.1475760 secs]<br> 330.176: [GC pause (young) 226M->94M(256M), 0.1727795 secs] </p> </blockquote> <p>and much, <em>much</em> later it is still OK!</p> <blockquote> <p>25882.894: [GC pause (young) 224M->125M(256M), 0.2126515 secs]<br> 25884.880: [GC pause (young) 224M->126M(256M), 0.2059802 secs]<br> 25887.027: [GC pause (young) 224M->125M(256M), 0.1851359 secs]<br> 25889.940: [GC pause (young) 223M->126M(256M), 0.2046496 secs]<br> 25891.567: [GC pause (young) 224M->126M(256M), 0.1600574 secs] </p> </blockquote> <p>and later still, a full GC</p> <blockquote> <p>37180.191: [GC pause (young) 225M->154M(256M), 0.1716404 secs]<br> 37182.163: [GC pause (young) (initial-mark) 225M->153M(256M)37182.326: [GC concurrent-mark-start], 0.1622246 secs]<br> 37183.089: [GC concurrent-mark-end, 0.7635219 sec]<br> 37183.090: [GC remark, 0.0032547 secs]<br> 37183.093: [GC concurrent-count-start]<br> 37183.297: [GC concurrent-count-end, 0.2043307]<br> 37183.393: [GC cleanup 198M->198M(256M), 0.0068127 secs]<br> 37183.400: [GC concurrent-cleanup-start]<br> 37183.400: [GC concurrent-cleanup-end, 0.0000393]<br> 37183.648: [GC pause (young) 222M->153M(256M), 0.1483041 secs]<br> 37184.235: [GC pause (partial) 171M->91M(256M), 0.2520714 secs]<br> 37187.223: [GC pause (young) 221M->92M(256M), 0.1721220 secs] </p> </blockquote> <h2>UPDATE</h2> <p>Well, since switching to the G1 garbage collector on jdk1.6.0_18 the application has behaved itself on <em>three</em> consecutive days. I suspect that Erik is correct in his analysis of the situation about the VM getting itself into a <em>death spiral</em> in high-throughput cases where it has promoted objects into the tenured generation.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload