StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Trie fields make range queries faster by precomputing certain range results and storing them as a single record in the index. For clarity, my example will use integers in base ten. The same concept applies to all trie types. This includes dates, since a date can be represented as the number of seconds since, say, 1970.</p> <p>Let's say we index the number <code>12345678</code>. We can tokenize this into the following tokens.</p> <pre><code>12345678 123456xx 1234xxxx 12xxxxxx </code></pre> <p>The <code>12345678</code> token represents the actual integer value. The tokens with the <code>x</code> digits represent ranges. <code>123456xx</code> represents the range <code>12345600</code> to <code>12345699</code>, and matches all the documents that contain a token in that range.</p> <p>Notice how in each token on the list has successively more <code>x</code> digits. This is controlled by the precision step. In my example, you could say that I was using a precision step of 2, since I trim 2 digits to create each extra token. If I were to use a precision step of 3, I would get these tokens.</p> <pre><code>12345678 12345xxx 12xxxxxx </code></pre> <p>A precision step of 4:</p> <pre><code>12345678 1234xxxx </code></pre> <p>A precision step of 1:</p> <pre><code>12345678 1234567x 123456xx 12345xxx 1234xxxx 123xxxxx 12xxxxxx 1xxxxxxx </code></pre> <p>It's easy to see how a smaller precision step results in more tokens and increases the size of the index. However, it also speeds up range queries.</p> <p>Without the trie field, if I wanted to query a range from 1250 to 1275, Lucene would have to fetch 25 entries (<code>1250</code>, <code>1251</code>, <code>1252</code>, ..., <code>1275</code>) and combine search results. With a trie field (and precision step of 1), we could get away with fetching 8 entries (<code>125x</code>, <code>126x</code>, <code>1270</code>, <code>1271</code>, <code>1272</code>, <code>1273</code>, <code>1274</code>, <code>1275</code>), because <code>125x</code> is a precomputed aggregation of <code>1250</code> - <code>1259</code>. If I were to use a precision step larger than 1, the query would go back to fetching all 25 individual entries.</p> <p><strong>Note:</strong> In reality, the precision step refers to the number of bits trimmed for each token. If you were to write your numbers in hexadecimal, a precision step of 4 would trim one hex digit for each token. A precision step of 8 would trim two hex digits.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload