Note that there are some explanatory texts on larger screens.

plurals
  1. POSQLite Optimization for Millions of Entries?
    text
    copied!<p>I'm trying to tackle a problem by using a SQLite database and Perl modules. In the end, there will be tens of millions of entries I need to log. The only unique identifier for each item is a text string for the URL. I'm thinking of doing this in two ways:</p> <p>Way #1: Have a good table, bad table, unsorted table. (I need to check the html and decide whether I want it.) Say we have 1 billion pages total, 333 million URLs in each table. I have a new URL to add, and I need to check and see if it's in any of the tables, and add it to the Unsorted if it is unique. Also, I would be moving a lot of rows around with this option.</p> <p>Way #2: I have 2 tables, Master and Good. Master has all 1 billion page URLs, and Good has the 333 million that I want. New URL, need to do the same thing, except this time I am only querying one table, and I would never delete a row from Master, only add the data to Good.</p> <p>So basically, I need to know the best setup to <strong>quickly</strong> query a huge SQLite database to see if a text string of ~20 characters is unique, then add if it isn't.</p> <p>Edit: I'm now trying to get Berkeley DB to work using the Perl module, but no dice. Here's what I have:</p> <pre><code>use BerkeleyDB; $dbFolder = 'C:\somedirectory'; my $env = BerkeleyDB::Env-&gt;new ( -Home =&gt; $dbFolder ); my $db = BerkeleyDB::Hash-&gt;new ( -Filename =&gt; "fred.db", -Env =&gt; $env ); my $status = $db-&gt;db_put("apple", "red"); </code></pre> <p>And when I run this, I get the following:</p> <pre><code>Can't call method "db_put" on an undefined value at C:\Directory\perlfile.pl line 42, &lt;STDIN&gt; line 1. </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload