Note that there are some explanatory texts on larger screens.

plurals
  1. POCron on AWS (or distributed systems in general)
    primarykey
    data
    text
    <p>I am surprised I was not able to find more on this, but alas, I still cannot find the answer. We recently converted to AWS, moving our simple website to a more robust and reliable system. What is currently baffling me is managing cron jobs on the distributed system, when that cron job gets pushed to every instance in the environment.</p> <p>Here's the use case:</p> <h2>Background</h2> <h3>Setup</h3> <p>We are running a traditional LAMP stack. Probably the first problem, but it's what we got.</p> <h3>DB Tables</h3> <pre><code>table1 - id int(11) - start date - interval int(11) (number of seconds) table2 - id int(11) - table1_id int(11) - sent datetime </code></pre> <h2>Goal</h2> <p>The goal is that a script will run once every day and check the following:</p> <ol> <li>The current date is past <code>table1.start</code></li> <li><code>table1.start</code> &lt; current date</li> <li><code>table1.interval</code> > 0</li> <li>today is exactly a whole interval away (so would fail if the interval was 7 days [in seconds] and it is the 6th day)</li> <li>there is no entry in <code>table2</code> such that <code>table2.sent</code> is today and <code>table2.table1_id</code> matches the previous checks.</li> </ol> <p>If all these checks pass, we insert an entry into table2 for each table1 that has the interval. This also means we send an email based on the data in table2.</p> <h2>The Problem</h2> <p>Essentially, we have two queries, represented by the aforementioned blocks. The issue is that on a distributed system, each instance will run cron at the same time (or within milliseconds of each other). There is no notion of a "transaction," so each instance will send an email if one doesn't get a chance to insert into <code>table2</code> before the others run the first query.</p> <h2>Solutions???</h2> <p>I have done a fair amount of research on this, but the only potential solutions I have come up with are detailed below:</p> <h3>The Cron Instance</h3> <p>Set up a single, independent instance responsible for running cron jobs. While this will most certainly (as far as I can see) work, it is very costly for a job that is not terribly expensive and only needs to run once a day, at most.</p> <h3>PHP Scheduler</h3> <p>Set cron to regularly run a PHP script that acts as a scheduler. This was the route we were going down after the research suggested it would be the simplest for our limited time and money. The problem that I ran into was that this just seemed to shift the concurrency problem from consuming jobs to scheduling jobs. When do you schedule the jobs such that multiple jobs aren't scheduled at the same time from each instance running the cron?</p> <p>This method also seems very "kludgy" (to borrow a favorite word of my friend), and I would have to agree.</p> <h3>Transactions</h3> <p>Although I have researched this quite a bit, concurrency was always solved with atomic transactions on the database, but so far as I can tell, this isn't easy to achieve with LAMP. But perhaps I am wrong, and I would be very happy to be proven so.</p> <h2>Finally</h2> <p>So if anyone can help me figure this one out, I would greatly appreciate it. Perhaps my Googling skills are getting rusty, but I cannot imagine I am the only one suffering from this (probably simple) task.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload