Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy is reading a sqlite database slower on different machines with identical hardware?
    text
    copied!<p>Okay, so I'm observing some behavior that I would certainly call 'unexpected' and I'm curious if anyone can offer and insight as to what on earth is going on. I'll try to keep it concise...</p> <p>I have a sqlite database that I'm accessing with Python (150k rows, 11 tables) for the purposes of neural network training. The ends are inconsequential, but it means that my data points are stored in one table as rather large blobs and the DB is about 5GB. Because I'm only pulling certain rows (due to cross validation or other kinds of filtering), I've found indexing on certain meaningful columns to result in significant increases in speed. When the sun is shining, I can select, fetch and format about 500 datapoints in around 2 seconds. This is great.</p> <p>However, as anyone familiar with neural nets / backprop / SGD can attest, this process can – depending on the data and architecture - take forever. To optimally parallelize things, I've requisitioned multiple machines (Mac Pro, 8GB RAM, 16 cores at who knows what speed) to launch training scenarios under different conditions in different threads. Due to memory limitions, the ceiling is around 6 or 7 separate threads each accessing the same database file. Now, to the best of my knowledge, these machines are identical.</p> <p>Here's where it gets bizarre. At first, performance was stellar on my baseline machine, and the parallel thread count never slowed things down (one thread versus seven was basically equal). When I first copied this database file to other machines (and recreated my indexes), one was equally as fast as my baseline, but another was about twice as slow. I figured (at the time) it was a hardware discrepancy and moved on with my life. Then, having modified the sqlite database and recopying to all the machines, it is now fast on the machine it was previously slow on, while all other machines are showing slower performance - including the baseline.</p> <p>I don't get it. I've done the same things to all instances. The database files are identical on copy, and then I drop / create the same indexes, as I'm pretty sure sqlite indexes refer to absolute locations on disk. I'm in a unix environment so file fragmentation shouldn't be the culprit (right?). Concurrency doesn't seem to be an issue in some configurations, while others result in a bottleneck. I've since checked and the machines are actually identical on paper. And the most confusing part for me is that performance is differing within the configuration of a single machine.</p> <p>I'm really at a loss. Any kind of direction would be greatly appreciated. I'm by no means well-versed in relational database programming, but I'm out of ideas at this point.</p> <h3>UPDATE: I might have 'resolved' my issue, but in no way is this question answered. I've found that if I basically power-cycle the database file on each system, I eventually get the performance to be equal across machines. The process is something like...</h3> <pre><code> while unhappy: 1. Drop all indexes 2. Create new indexes 3. Make a handful of select/fetch calls 4. Commit / quit / restart </code></pre> <p>After a while, it seems to settle out to something satisfactory. No idea why though.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload