Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Identifying the gaps is an interesting problem. The best approach will depend on the size of the gap, but here is another way to tackle it, and one which might be better if the gaps are reasonably large compared to the number of records you have.</p> <p>Use a MySQL aggregation function in a query to count the number of records for a set of buckets. The buckets need to be similar in size to the kinds of gaps you are interested in. Assuming you're interested in gaps approximating a day or so, I'd do something like this:</p> <pre><code>SELECT TO_DAYS(my_timestamp), COUNT(*) FROM my_table GROUP BY TO_DAYS(my_timestamp) </code></pre> <p>This will return an association between days and timestamp counts. I'd do the rest in a language like Perl or Java (or even R, see later) where I can process the data. </p> <p>The technique I'd use would be a test of the difference between the observed frequency (the count) and the expected frequency, which will be the total number of records, divided by day range. The expected frequency for each day would be something like:</p> <pre><code>SELECT (SELECT COUNT(*) FROM my_table) / ((SELECT TO_DAYS(MAX(my_timestamp)) FROM my_table) - (SELECT TO_DAYS(MIN(my_timestamp)) FROM my_table) + 1) </code></pre> <p>Now, for each bucket (remembering that in the first result, completely missing days will just be not returned, not returned as a count of zero -- you need to treat them as if they are zero, you can use a statistical test, the chi square test, to estimate the probability of this being chance (for details, see: <a href="http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test" rel="nofollow">http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test</a>). The calculation is, basically ((expected - observed)^2 / expected). This is an estimate of the likelihood of deviation.</p> <p>If you need to work out which buckets are low in samples, set a reasonable threshold on this calculated value, and look for buckets where the value exceeds the threshold. It may take a little experimentation to devise an appropriate value, but this is a sound way of determining gaps. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload