Note that there are some explanatory texts on larger screens.

plurals
  1. POSending huge vector to a Database in R
    text
    copied!<p>Good afternoon,</p> <p>After computing a rather large vector (a bit shorter than 2^20 elements), I have to store the result in a database.</p> <p>The script takes about 4 hours to execute with a simple code such as :</p> <pre><code>#Do the processing myVector&lt;-processData(myData) #Sends every thing to the database lapply(myVector,sendToDB) </code></pre> <p>What do you think is the most efficient way to do this?</p> <p>I thought about using the same query to insert multiple records (multiple inserts) but it simply comes back to "chucking" the data.</p> <p>Is there any vectorized function do send that into a database?</p> <p>Interestingly, the code takes a huge amount of time before starting to process the first element of the vector. That is, if I place a <code>browser()</code> call inside <code>sendToDB</code>, it takes 20 minutes before it is reached for the first time (and I mean 20 minutes without taking into account the previous line processing the data). So I was wondering what R was doing during this time?</p> <p>Is there another way to do such operation in R that I might have missed (parallel processing maybe?)</p> <p>Thanks! </p> <p>PS: here is a skelleton of the sendToDB function:</p> <pre><code>sendToDB&lt;-function(id,data) { channel&lt;-odbcChannel(...) query&lt;-paste("INSERT INTO history VALUE(",id,",\"",data,"\")",sep="") sqlQuery(channel,query) odbcClose(channel) } </code></pre> <p>That's the idea.</p> <p><strong>UPDATE</strong></p> <p>I am at the moment trying out the <code>LOAD DATA INFILE</code> command.</p> <p>I still have no idea why it takes so long to reach the internal function of the <code>lapply</code> for the first time.</p> <p><strong><em>SOLUTION</em></strong></p> <p><code>LOAD DATA INFILE</code> is indeed much quicker. Writing into a file line by line using <code>write</code> is affordable and <code>write.table</code> is even quicker.</p> <p>The overhead I was experiencing for <code>lapply</code> was coming from the fact that I was looping over <code>POSIXct</code> objects. It is much quicker to use <code>seq(along.with=myVector)</code> and then process the data from within the loop.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload