Note that there are some explanatory texts on larger screens.

plurals
  1. POMysql : splitting the processing of a particular table between different nodes
    primarykey
    data
    text
    <p><br/></p> <p>I have a dilemma, maybe you can help me.</p> <p>I have a table which functions as a work queue. Records are inserted and need to be processed. After a record is processed, it is deleted from the queue. There are a few restrictions :</p> <ul> <li>only one entity can process a record at any given time (and by "entity", I mean : a thread, or a separate computer which connects to the same database)</li> <li>entities are somewhat dynamic. They might change (either number of entities, or characteristics)</li> <li>an entity processes a record in 1 transaction</li> <li>processing must happen in parallel (if entity1 picks batch1, entity2 must be able to process batch2 in parallel, without waiting for entity1 to finish processing)</li> <li>once an entity has picked a record to process, the whole "batch" of records this one belongs to, must not be picked by other entity. When I say "batch", I mean that the table is (logically) organized as follows : <ul> <li>row1 (batch1)</li> <li>row2 (batch1)</li> <li>row3 (batch2)</li> <li>row4 (batch2)</li> <li>row5 (batch2)</li> <li>.... and so on.</li> </ul></li> </ul> <p>So lets say entity1 and entity2 both want to pick a processing slice from the table. If entity1 picks row1, then entity2 can pick anything else except batch1 (anything else except row1 and row2).</p> <p>Lets abstract out the processing part, because it doesn't matter what the actual processing is. I'm interested to know how I can stop the entities from clashing with each other, using only a mysql database, but also keeping the parallel nature of the processing.</p> <p>From my point of view, I see two very general directions :</p> <ol> <li>Use some sort of status field, which indicates that a particular entity has picked a batch, and this one has to be excluded from future picks. This idea has the disadvantage that if the entity that picked the batch, crashes, then it's a bit difficult to resume the processing by other entities. </li> <li>Using mysql locks, which has the disadvantage that it's difficult to ensure parallel processing, and not sequential. For example I could do a select... for update, for entity1. But entity2 cannot do the same select... for update, because this would wait for the first entity to finish processing before acquiring the batch it needs.</li> </ol> <p>I'm interested to know :</p> <ul> <li>which direction would result in the smallest coding effort</li> <li>are there any other directions I'm missing here (keeping in mind that the entities cannot communicate with each other except through database)</li> <li>if there is a standard pattern for this kind of problem</li> <li>if you can point me to an article debating this kind of problem.</li> <li>what is the most efficient way to solve this problem.</li> </ul> <p>So what I have here is that the database must split a table between different entities, for processing, and would like to know the best way to do it. I hardly think I'm the first one dealing with this problem, and would like to know what you think. Also, please note that the records can be split in batches through a fairly simple criteria (say, a batchId)</p> <p>Kind regards, <br/> Andrei.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload