Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I wrote some scripts to import the Stack Overflow data dump into an SQL database. I split the <em>tags</em> list to populate a many-to-many table as you describe. I use a technique similar to the following:</p> <ol> <li><p>Read a row from WIDGET</p> <pre><code>while ($row = $pdoStmt-&gt;fetch()) { </code></pre></li> <li><p>Use <code>explode()</code> to split on a comma</p> <pre><code>$states = explode(",", $row["state"]); </code></pre></li> <li><p>Loop over elements, writing to a new CSV file</p> <pre><code>$stateid = array(); $stfile = fopen("states.csv", "w+"); $mmfile = fopen("manytomany.csv", "w+"); $i = 0; foreach ($state as $st) { if (!array_key_exists($st, $stateid)) { $stateid[$st] = ++$i; fprintf($stfile, "%d,%s\n", $i, $st); } fprintf($mmfile, "%s,%s\n", $row["id"], $stateid[$st]); } fclose($stfile); fclose($mmfile); </code></pre></li> <li><p>When you're done, load the CSV files into the database. You can do this in the mysql client:</p> <pre><code>mysql&gt; LOAD DATA INFILE 'states.csv' INTO TABLE STATES; mysql&gt; LOAD DATA INFILE 'manytomany.csv' INTO TABLE WIDGET_ST; </code></pre></li> </ol> <p>It may seem like a lot of work, but using the LOAD DATA command runs 20x faster than inserting one row at a time, so it's worthwhile if your data set is large.</p> <hr> <p>Re your comment:</p> <p>Right, I also have data in a database already. It turns out that the solution I show above, dumping to CSV files and re-importing in normalized format, is <strong>many times faster</strong> than doing INSERT statements inside the loop that splits the data.</p> <p>Each brand of database has its own tool for importing bulk data. See my answer to <a href="https://stackoverflow.com/questions/2970504/optimizing-big-import-in-php/2970519#2970519">Optimizing big import in PHP</a> for a list of bulk import solutions per database. </p> <p>You should use the tools provided by each database. Trying to remain cross platform only makes your code <a href="http://en.wikipedia.org/wiki/Jack_of_all_trades,_master_of_none" rel="nofollow noreferrer">Jack of all trades, master of none</a>. Besides, in 90% of cases when people bend over backwards to make their code database-independent, it turns out they never use more than one database. And you can't achieve complete database independence anyway.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload