Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I improve garbage collector performance of .NET 4.0 in highly concurrent code?
    text
    copied!<p>I am using the task parallel library from .NET framework 4 (specifically <code>Parallel.For</code> and <code>Parallel.ForEach</code>) however I am getting extremely mediocre speed-ups when parallelizing some tasks which look like they should be easily parallelized on a dual-core machine. </p> <p>In profiling the system, it looks like there is a lot of thread synchronization going on because of the garbage collector. I am doing a lot of allocation of objects, so I am wondering how I can improve the concurrency while minimizing a rewrite of my code. </p> <p>For example are there some techniques that can be useful in this situation: </p> <ul> <li>Should I try to manage the GC manually?</li> <li>Should I be using <code>Dispose</code>? </li> <li>Should I be pinning objects?</li> <li>Should I be doing other unsafe code tricks? </li> </ul> <p>POSTSCRIPT:</p> <p>The problem is not the GC running too often, it is that the GC prevents concurrent code from being running in parallel efficiently. I also don't consider "allocate fewer objects" to be an acceptable answer. That requires rewriting too much code to work around a poorly parallelized garbage collector. </p> <p>I already found one trick which helped overall performance (<a href="http://blogs.msdn.com/visualizeparallel/archive/2009/12/28/parallel-performance-case-study-finding-references-to-parallel-extensions.aspx" rel="noreferrer">using gcServer</a>) but it didn't help the concurrent performance. In other words <code>Parallel.For</code> was only 20% faster than a serial For loop, on an embarrassingly parallel task. </p> <p>POST-POSTSCRIPT:</p> <p>Okay, let me explain further, I have a rather big and complex program: an optimizing interpreter. It is fast enough, but I want its performance when given parallel tasks (primitive operations built into my language) to scale well as more cores are available. I allocate lots of small object during evaluations. The whole interpreter design is based on all values being derived from a single polymorphic base object. This works great in a single-threaded application, but when we try to apply the Task Parallel Library to parallel evaluations there is no advantage.</p> <p>After a lot of investigation into why the Task Parallel Library was not properly distributing work across cores for these tasks, it seems the culprit is the GC. Apparently the GC seems to act as a bottle-neck because it does some behind the scene thread synchronization that I don't understand.</p> <p>What I need to know is: what exactly is the GC doing that can cause heavily concurrent code to perform badly when it does lots of allocations, and how we can work around that <b>other than</b> just <i>allocating fewer objects</i>. That approach has already occurred to me, and would require a significant rewrite of a lot of code. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload