Note that there are some explanatory texts on larger screens.

plurals
  1. POreshape alternating columns in less time and using less memory
    primarykey
    data
    text
    <p>How can I do this reshape faster and so that it takes up less memory? My aim is to reshape a dataframe that is 500,000 rows by 500 columns with 4 Gb RAM. </p> <p>Here's a function that will make some reproducible data:</p> <pre><code>make_example &lt;- function(ndoc, ntop){ # doc numbers V1 = seq(1:ndoc) # filenames V2 &lt;- list("vector", size = ndoc) for (i in 1:ndoc){ V2[i] &lt;- paste(sample(c(rep(0:9,each=5),LETTERS,letters),5,replace=TRUE),collapse='') } # topic proportions tvals &lt;- data.frame(matrix(runif(1:(ndoc*ntop)), ncol = ntop)) # topic number tnumvals &lt;- data.frame(matrix(sample(1:ntop, size = ndoc*ntop, replace = TRUE), ncol = ntop)) # now make topic props and topic numbers alternating columns (rather slow!) alternating &lt;- data.frame(c(matrix(c(tnumvals, tvals), 2, byrow = T)) ) # make colnames for topic number and topic props ntopx &lt;- sapply(1:ntop, function(j) paste0("ntop_",j)) ptopx &lt;- sapply(1:ntop, function(j) paste0("ptop_",j)) tops &lt;- c(rbind(ntopx,ptopx)) # make data frame dat &lt;- data.frame(V1 = V1, V2 = unlist(V2), alternating) names(dat) &lt;- c("docnum", "filename", tops) # give df as result return(dat) } </code></pre> <p>Make some reproducible data:</p> <pre><code>set.seed(007) dat &lt;- make_example(500000, 500) </code></pre> <p>Here's my current method (thanks to <a href="https://stackoverflow.com/a/8058714/1036500">https://stackoverflow.com/a/8058714/1036500</a>): </p> <pre><code>library(reshape2) NTOPICS = (ncol(dat) - 2 )/2 nam &lt;- c('num', 'text', paste(c('topic', 'proportion'), rep(1:NTOPICS, each = 2), sep = "")) system.time( dat_l2 &lt;- reshape(setNames(dat, nam), varying = 3:length(nam), direction = 'long', sep = "")) system.time( dat.final2 &lt;- dcast(dat_l2, dat_l2[,2] ~ dat_l2[,3], value.var = "proportion" ) ) </code></pre> <p>Some timings, just for the <code>reshape</code> since that's the slowest step:</p> <p><code>make_example(5000,100)</code> = 82 sec</p> <p><code>make_example(50000,200)</code> = 2855 sec (crashed on attempting the second step)</p> <p><code>make_example(500000,500)</code> = not yet possible...</p> <p>What other methods are there that are faster and less memory intensive for this reshape (<code>data.table</code>, <a href="https://stackoverflow.com/a/9344168/1036500">this</a>)?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload