Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I wouldn't bother tracking those stats too much at the server level. Your going to introduce a fair amount of overhead.</p> <p>Instead, the control server should just maintain a list of work units. As a client becomes available, let it grab the next unit in line and process it. Rinse, repeat.</p> <p>Once the list of work units for a given matrix is exhausted, allow currently incomplete work units to be reassigned.</p> <p>Examples based off of a matrix containing 10 work units and 5 servers.</p> <p><strong>Equally fast, all available:</strong></p> <p>Server 1 checks in and grabs unit 1. This proceeds for the next 4 machines (ie: Server 2 gets unit 2...) When unit 1 is done, server 1 then grabs unit 6. The others grab the rest. Once the last server checks in, the matrix is done.</p> <p><strong>Low Disparate performance, all available:</strong><br> You start the round robin again and the first 5 units are acquired by the servers. However, Server 1 takes 30% longer than the others. This means Server 2 will grab unit 6. etc. At some point server 1 will check in unit 1, meanwhile units 2 through 5 will have been completed and 6 through 10 will have been assigned. Server 1 is assigned unit 6 as it's not done yet. However, Server 2 will check in it's completed work before Server 1 finishes. No big deal, just throw away that last result.</p> <p><strong>Huge Disparate Performance, all available</strong><br> You start the round robin again and the first 5 units are acquired by the servers. Let's say Server 1 takes 400% more time than the others. This means Server 2 will grab unit 6, etc. After server 2 checks in unit 6 it will see that unit #1 is still in process. Go ahead and assign it to Server 2; which will complete it before Server 1 returns.</p> <p>In this case you should probably monitor for those machines that are consistently reporting work late and drop them from further consideration. Of course, you will have to make some allowances for those that go offline due to shutdown or personal usage. Probably some type of weighted rating where once it drops below a certain threshold you simply deny it further work; perhaps the rating is reset every so often to allow rebalancing from a steady state it will meet.</p> <p><strong>Machine disappears</strong><br> This has the exact same plan as the "Huge Disparate Performance" listed above. The only difference is that the machine will either never report in, or will do so after some unknown amount of time.</p> <p>If for some reason you have more machines than units then an interesting thing happens: multiple servers will be assigned the same work unit right off the bat. You can either stop this by putting in place some type of delay (like a unit must be in process for x minutes before allowing it to be reassigned) or simply allow it to happen. This should be thought through.</p> <hr> <p>What have we done? First, we alleviated the need to track individual performance. Second, we've allowed for machines to just disappear while making sure the work is still completed. Third, we've ensured that the work will be completed in the least amount of time as possible.</p> <p>It's a little more chatty than simply assigning blocks of multiple units to machines based on performance; however, this allows for even the fast machines to be unplugged from the network while ensuring total recoverability. Heck you could kill all of the machines and later turn on some of them to pick up where you left off.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload