Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As noticed in comments on the original post, it depends on several things (index, database engine, type of storage media, available cache memory, etc.).</p> <p>We <em>could</em> make an educated guess that:</p> <p>a) We should always have a full-table scan unless we have an index on the column (and I'd not recommend one just for the sake of this query, as you will penalize down your online writes to benefit an offline process);</p> <p>b) The "cost" for checking the values is highly amortized by the savings of not writing unchanged records (unless the database implicitly does so, which may or may not happen) <em>if</em> the majority of records will be zeroes.</p> <p>but assumptions start to pile up, so I'd rather <strong>measure</strong> instead. To play a bit, I've: </p> <ul> <li><p>Created a test table with a "status" numeric column</p></li> <li><p>Filled it with a few million records (e.g., using a script like the one in <a href="https://stackoverflow.com/a/17268740/64635">https://stackoverflow.com/a/17268740/64635</a>)</p></li> <li><p>Set it up with different values, then tried to <code>UPDATE</code> the column to 0, with and without the <code>WHERE</code>.</p></li> </ul> <p>My results (which <em>may</em> differ from yours) were that the WHERE query was way faster <em>if</em> there were indeed few non-zero records. E.g., after setting up the table with either of</p> <pre><code>UPDATE myTable SET myColumn = 1; /* All values non-zero (1) */ UPDATE myTable SET myColumn = FLOOR(RAND()*10); /* ~90% values non-zero */ </code></pre> <p>both <code>WHERE</code> and non-<code>WHERE</code> updates to 0 were slow (and no noticeable difference between them, implying "a" above is true), whereas after any of</p> <pre><code>UPDATE myTable SET myColumn = 0; /* All values zero */ UPDATE myTable SET myColumn = IF(id % 500 = 0, 1, 0); /* 99.8% values zero */ </code></pre> <p>the <code>UPDATE</code> with <code>WHERE</code> was insanely faster (as implied by "b").</p> <p>I'd recommend trying these tests (and even others, including the index if you really wish) on your setup (e.g., creating a separate table and running tests like these) and consider your data set (measure/estimate the % of records that will be non-zero when your cron job runs). Keep in mind you likely want to optimize for cost/availability (including <em>your</em> time as a cost) instead of finding the absolutely most performant solution in the universe (which is likely not cost-effective), and you will surely find the best solution. Good luck!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload