Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>If all of your columns are of the same class, convert to a matrix before writing out, provides a nearly 6x speed up. Also, you can look into using <code>write.matrix()</code> from package <code>MASS</code>, though it did not prove faster for this example. Maybe I didn't set something up properly:</p> <pre><code>#Fake data m &lt;- matrix(runif(256*65536), nrow = 256) #AS a data.frame system.time(write.csv(as.data.frame(m), "dataframe.csv")) #---------- # user system elapsed # 319.53 13.65 333.76 #As a matrix system.time(write.csv(m, "matrix.csv")) #---------- # user system elapsed # 52.43 0.88 53.59 #Using write.matrix() require(MASS) system.time(write.matrix(m, "writematrix.csv")) #---------- # user system elapsed # 113.58 59.12 172.75 </code></pre> <h1>EDIT</h1> <p>To address the concern raised below that the results above are not fair to data.frame, here are some more results and timing to show that the overall message is still "convert your data object to a matrix if possible. If not possible, deal with it. Alternatively, reconsider why you need to write out a 200MB+ file in CSV format if the timing is of the utmost importance":</p> <pre><code>#This is a data.frame m2 &lt;- as.data.frame(matrix(runif(256*65536), nrow = 256)) #This is still 6x slower system.time(write.csv(m2, "dataframe.csv")) # user system elapsed # 317.85 13.95 332.44 #This even includes the overhead in converting to as.matrix in the timing system.time(write.csv(as.matrix(m2), "asmatrix.csv")) # user system elapsed # 53.67 0.92 54.67 </code></pre> <p>So, nothing really changes. To confirm this is reasonable, consider the relative time costs of <code>as.data.frame()</code>:</p> <pre><code>m3 &lt;- as.matrix(m2) system.time(as.data.frame(m3)) # user system elapsed # 0.77 0.00 0.77 </code></pre> <p>So, not really a big deal or skewing information as much as the comment below would believe. If you're still not convinced that using <code>write.csv()</code> on large data.frames is a bad idea performance wise, consult the manual under the <code>Note</code>:</p> <pre><code>write.table can be slow for data frames with large numbers (hundreds or more) of columns: this is inevitable as each column could be of a different class and so must be handled separately. If they are all of the same class, consider using a matrix instead. </code></pre> <p>Finally, consider moving to a native RData object if you're still losing sleep over saving things faster</p> <pre><code>system.time(save(m2, file = "thisisfast.RData")) # user system elapsed # 21.67 0.12 21.81 </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload