Note that there are some explanatory texts on larger screens.

plurals
  1. POPeriodically retrieve all rows to find the highest 100 counters
    text
    copied!<p>I'm working on a cassandra database storing the amount of times a word occurred. I want to find out which 100 words occur the most times. In a relational database, it'd be something like this:</p> <pre><code>select * FROM wordcounter ORDER BY counts DESC LIMIT 100; </code></pre> <p>but ordering by a <code>counter-column</code> in cassandra is impossible.</p> <p>So, instead I'll have to periodically (probably once per day) fetch <strong>all</strong> rows and write the 100 words with the highest counters to the db. The following is not an option;</p> <pre><code>select * FROM wordcounter </code></pre> <p>Because that would return way too much data. I'll have to do it in increments, but how (and how many rows per query is acceptable)?</p> <p><strong>UPDATE</strong></p> <p>It's supposedly possible to iterate over all cassandra rows, but I am using PHP <code>pdo</code> to communicate with cassandra &amp; it certainly doesn't have an iterate feature as far as I've seen. But I found I can query by <code>token</code> so this is possible;</p> <pre><code>select * FROM wordcounter LIMIT 100; </code></pre> <p>And then keep looping this until 0 results are returned</p> <pre><code>select * FROM wordcounter WHERE token(word) &gt; token('lastword') LIMIT 100; </code></pre> <p>So this basically is the equivelent of an <code>OFFSET</code> which will allow me to process parts of the dataset without having to query it all at once. But I guess this does mean I can't distribute the query over multiple systems. Does anyone know of any alternatives?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload