Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I'd be inclined to use a hash instead of SQLite to do what you want to do. A hash is optimized to test for existence without the need to keep the values in any sorted order and with no need to keep a redundant copy of the datum in an index. The hash algorithm applied to the datum yields the location where it would be stored, if it did exist; you can seek to that location and see if it's there. I don't think you'd need to keep the hash table in RAM.</p> <p>Here's how you might take a hybrid hash/SQLite approach.</p> <p>Create a SQLite table </p> <pre><code>STORE id INTEGER PRIMARY KEY BUCKET (integer, indexed) URL (text, not indexed) status </code></pre> <p>You could have three of these tables, STORE1, STORE2, and STORE3 if you want to keep them separate by status.</p> <p>Let's assume that there will be 250,000,001 distinct buckets in each store. (You can experiment with this number; make it a prime number).</p> <p>Find a hashing algorithm that takes two inputs, the URL string and 250,000,0001 and returns a number between 1 and 250,000,001. </p> <p>When you get a URL, feed it to the hashing algorithm and it will tell you which BUCKET to look in:</p> <p>Select * from STORE where BUCKET = {the value returned by your hash function}. </p> <p>Your index on the BUCKET field will quickly return the rows, and you can examine the URLs. If the current URL is not one of them, add it:</p> <pre><code>INSERT STORE(BUCKET, URL) VALUES( {your hash return value}, theURL). </code></pre> <p>SQLite will be indexing integer values, which I think will be more efficient than indexing the URL. And the URL will be stored only once. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload