Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I speed up a Mac app processing 5000 independent tasks?
    primarykey
    data
    text
    <p>I have a long running (5-10 hours) Mac app that processes 5000 items. Each item is processed by performing a number of transforms (using Saxon), running a bunch of scripts (in Python and Racket), collecting data, and serializing it as a set of XML files, a SQLite database, and a CoreData database. Each item is completely independent from every other item.</p> <p>In summary, it does a lot, takes a long time, and appears to be highly parallelizable.</p> <p>After loading up all the items that need processing it, the app uses GCD to parallelize the work, using <code>dispatch_apply</code>:</p> <pre><code>dispatch_apply(numberOfItems, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^(size_t i) { @autoreleasepool { ... } }); </code></pre> <p>I'm running the app on a Mac Pro with 12 cores (24 virtual). So I would expect to have 24 items being processed at all times. However, I found through logging that the number of items being processed varies between 8 and 24. This is literally adding hours to the run time (assuming it <em>could</em> work on 24 items at a time).</p> <p>On the one hand, perhaps GCD is really, really smart and it is already giving me the maximum throughput. But I'm worried that, because much of the work happens in scripts that are spawned by this app, maybe GCD is reasoning from incomplete information and isn't making the best decisions.</p> <p>Any ideas how to improve performance? After correctness, the number one desired attribute is shortening how long it takes this app to run. I don't care about power consumption, hogging the Mac Pro, or anything else.</p> <p><strong>UPDATE:</strong> In fact, this looks alarming in the <a href="http://developer.apple.com/library/mac/DOCUMENTATION/General/Conceptual/ConcurrencyProgrammingGuide/OperationQueues/OperationQueues.html" rel="nofollow">docs</a>: "The actual number of tasks executed by a concurrent queue at any given moment is variable and can change dynamically as conditions in your application change. Many factors affect the number of tasks executed by the concurrent queues, including the number of available cores, <strong>the amount of work being done by other processes</strong>, and the number and priority of tasks in other serial dispatch queues." (emphasis added) It looks like having other processes doing work will adversely affect scheduling in the app.</p> <p>It'd be nice to be able to just say "run these blocks concurrently, one per core, don't try to do anything smarter".</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload