Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This issue has everything to do with partitioners, not with the degree of parallelism. The solution is to implement a custom data partitioner. </p> <p>If the dataset is large, it seems the mono implementation of the TPL is guaranteed to run out of memory.This happened to me recently (essentially I was running the above loop, and found that the memory increased linearly until it gave me an OOM exception).</p> <p>After tracing the issue, I found that by default mono will divide up the enumerator using an EnumerablePartitioner class. This class has a behavior in that every time it gives data out to a task, it "chunks" the data by an ever increasing (and unchangeable) factor of 2. So the first time a task asks for data it gets a chunk of size 1, the next time of size 2*1=2, the next time 2*2=4, then 2*4=8, etc. etc. The result is that the amount of data handed to the task, and therefore stored in memory simultaneously, increases with the length of the task, and if a lot of data is being processed, an out of memory exception inevitably occurs.</p> <p>Presumably, the original reason for this behavior is that it wants to avoid having each thread return multiple times to get data, but it seems to be based on the assumption that all data being processed could fit in to memory (not the case when reading from large files).</p> <p>This issue can be avoided with a custom partitioner as stated previously. One generic example of one that simply returns the data to each task one item at a time is here:</p> <p><a href="https://gist.github.com/evolvedmicrobe/7997971">https://gist.github.com/evolvedmicrobe/7997971</a></p> <p>Simply instantiate that class first and hand it to Parallel.For instead of the enumerable itself</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload