Note that there are some explanatory texts on larger screens.

plurals
  1. POBulk MySql insert from huge array: optimization question
    text
    copied!<p>I've been asked to choose which is the best option out of three in terms of resource optimization.<br />Suppose I have a big Excel file of thousands of records, and I need to extract these data and insert them into a database. The 3 options are:</p> <ol> <li>Load everything into a multidimensional array and insert everything with just one complex query;</li> <li>Load everything into a multidimensional array, then loop over each excel row and do a simple insert query.</li> <li>Inside a loop, read each Excel row, put it into an array, and then do a simple insert query on the DB.</li> </ol> <p>This is for an interview test (I labelled it homework, not sure if it's right); I pondered for a while:</p> <ul> <li><strong>Case 1</strong>: I could risk an *out_of_memory* error (depending on the machine, of course), but it's the solution that performs less request to the database. Two drawbacks are the huge amount of memory to be allocated both to the array and the database. I know that I can transform excel into CSV, but it's not an option here. I'd go for a big array and a bulk insert, but I fear it would be hard for the database.</li> <li><strong>Case 2</strong>: I could risk an *out_of_memory* error when loading it into the array, but not for the second task. Nonetheless, performing thousands of queries could be a performance hit on the database, and this query is likely to be a candidate for optimization. </li> <li><strong>Case 3</strong>: Still have a loop over thousands records (which also takes a lot of memory...), and still have thousands queries to run (which hits the database).</li> </ul> <p>So, I actually chose answer one, and it took me some thinking before doing it. </p> <p>And it was WRONG. And I don't know actually which of the three was the right one. </p> <p>Can someone help me on this? Is that answer so bad? I thought that thousands of insert queries would be "bad", but seems like I'm totally wrong..</p> <p><strong>EDIT</strong> <br /> Clarification: my question is not about which is the best optimization <em>absolutely</em>, but which one among the three I presented; so I'm not looking into other alternatives, just an explanation on why I was wrong and which is, <strong>argumentatively</strong>, the best answer instead.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload