Note that there are some explanatory texts on larger screens.

plurals
  1. POPyTables vs. SQLite3 insertion speed
    primarykey
    data
    text
    <p>I bought Kibot's stock data and it is enormous. I have about 125,000,000 rows to load (1000 stocks * 125k rows/stock [1-minute bar data since 2010-01-01], each stock in a CSV file whose fields are Date,Time,Open,High,Low,Close,Volume). I'm totally new to python (I chose it because it's free and well-supported by a community) and I chose SQLite to store the data because of python's built-in support for it. (And I know the SQL language very well. SQLiteStudio is a gem of a free program.) </p> <p>My loader program is working well, but is getting slower. The SQLite db is about 6 Gb and it's only halfway loaded. I'm getting about 500k rows/hour loaded using INSERT statements and committing the transaction after each stock (approx 125k rows).</p> <p>So here's the question: <strong>is PyTables substantially faster than SQLite</strong>, making the effort to learn how to use it worth it? (And since I'm in learning mode, feel free to suggest alternatives to these two.) One things that bother me about PyTables is that it's really bare bones, almost like saving a binary file, for the free version. No "where clause" functions or indexing, so you wind up scanning for the rows you need.</p> <p>After I get the data loaded, I'm going to be doing statistical analysis (rolling regression &amp; correlation, etc) using something based on NumPy: Timeseries, larry, pandas, or a scikit. I haven't chosen the analysis package yet, so if you have a recommendation, and that recommendation is best used with either PyTables or pandas (or whatever), please factor that in to your response.</p> <p>(For @John) Python 2.6;<br> Windows XP SP3 32-bit;<br> Manufactured strings used as INSERT statements;<br> Memory usage is rock solid at 750M of the 2G physical memory;<br> CPU usage is 10% +/- 5%;<br> Totally i/o bound (disk is always crunching).<br> DB schema: </p> <pre><code>create table MinuteBarPrices ( SopDate smalldatetime not null, Ticker char( 5 ) not null, Open real, High real, Low real, Close real not null, Volume int, primary key ( SopDate, Ticker ) ); create unique index MinuteBarPrices_IE1 on MinuteBarPrices ( Ticker, SopDate ); </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload