Note that there are some explanatory texts on larger screens.

plurals
  1. POLoad-balancing in parallel processing application
    primarykey
    data
    text
    <p>I'm building a network-distributed parallel processing application that uses a combination of CPU and GPU resources across many machines. </p> <p>The app has to perform some very computationally expensive operations on a very large dataset over thousands of iterations:</p> <pre><code>for step = 0 to requested_iterations for i = 0 to width for j = 0 to height for k = 0 to depth matrix[i,j,k] = G*f(matrix[i,j,k]) </code></pre> <p>Also, the matrix operations have to be executed synchronously: that is, each iteration depends on the results of the frame that came immediately before it.</p> <p>The hardware available in this ad-hoc grid, comprising both dedicated servers and idle desktop machines, varies greatly in performance from machine to machine. I'm wondering what the best way is to balance the work load across the entire system.</p> <p>Some idiosyncracies:</p> <ol> <li><p>The grid should be as robust as possible. Some simulations require weeks to run, and it would be nice not to have to cancel a run if one out of 100 machines goes offline.</p></li> <li><p>Some of the lower-end machines (desktops that are idle but have to wake up when someone logs in) may join and leave the grid at any time. </p></li> <li><p>The dedicated servers may also join and leave the grid, but this is predictable.</p></li> </ol> <p>So far, what the best idea I've been able to come up with is:</p> <ol> <li>Have each node track the time itself takes to process a group of <em>n</em> cells in the matrix (cells processed per unit time) and report this to a central repository.</li> <li>Weight this time against the total time for a frame (across the entire grid) of the simulation and the total size of the problem domain. So, each node would get a score expressed in work units (matrix cells) per time, and a scalar rating expressing its performance vs the rest of the grid.</li> <li>On each frame, distribute the work load based on those scores so that each machine finishes as close to the same time as possible. If machine <code>A</code> is 100x faster than machine <code>B</code>, it will receive 100x as many matrix cells to process in a given frame (assuming that the matrix size is large enough to warrant including the extra machines). </li> <li>Nodes that leave the grid (desktops that are logged into, etc.) will have their workload redistributed among the remaining nodes.</li> </ol> <p><strong>Or</strong>,</p> <p>Arrange the nodes in a tree structure, where each node has a "weight" assigned. Nodes that are higher in the tree have a weight based on their ability combined with that of their children. This weight is adjusted per frame. When a node loses communication its child, it uses a cached tree graph to contact the orphaned children and re-balance its branch.</p> <p>If it makes a difference, the app is a combination of C# and OpenCL. </p> <p>Links to papers, example apps, and especially tutorials are welcome.</p> <p><strong>Edit</strong></p> <p>This isn't homework. I'm turning a simulator I wrote as part of my thesis into a more useful product. Right now the work is distributed uniformly with no accounting for performance of each machine, and no facility to recover from machines joining or leaving the grid. </p> <p><strong>Thanks</strong> for the excellent, detailed responses.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload