Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Sorry about this "second answer", but you really had two questions... @Ananda's solution for reshaping your data is extremely elegant. This is just another way to think about it.</p> <p>If you transpose the input matrix you get a new matrix, where the first column is country, the second column is city, the third column is "type" (for lack of a better term), and the actual data is in the other columns (so, there is one additional column for every "time"). </p> <p>So a different approach is to transpose first and then melt the new matrix. This avoids creating all the concatenated column names and splitting them back later. The problem is that <code>melt.data.frame</code> is exceptionally inefficient with a very large number of columns (which you would have here). So doing it this way would bbe 10X <em>slower</em> than @Ananda's approach.</p> <p>A solution is to use <code>melt.array</code> (just call <code>melt(...)</code> with an array rather than a data frame). As shown below, this approach is ~20X faster, with larger datasets (yours was 11MB).</p> <pre><code>library(reshape) # for melt(...) library(microbenchmark) # for microbenchmark(...) # this is just to model your situation with more realistic size # create a large data frame (250 columns of country, city, type; 1000 rows of time) df &lt;- rep(c("USA","UK","FR","CHN","GER"),each=50) # time + 250 columns df &lt;- rbind(df,rep(c(c("NY","SF","CHI","BOS","LA")),each=10)) df &lt;- rbind(df,rep(c("pork","peas","nuts","fruit","other"))) df &lt;- rbind(df,matrix(sample(1:1000,250*1000,replace=T),ncol=250)) df &lt;- cbind(c("time","","", as.character(as.Date(1:1000,origin="2010-01-01"))),df) df &lt;- data.frame(df) # big warning here about duplicated row names; not important # @Ananda'a approach: transform.orig &lt;- function(df){ B &lt;- df[-(1:3),] Bnames &lt;- df[1:3,] names(B) &lt;- apply(Bnames, 2, function(x) paste(x[x != ""], collapse = "_")) BL &lt;- melt(B, id.vars="time") final &lt;- cbind(BL[c("time", "value")], colsplit(BL$variable, "_", c("country", "state", "product"))) return(final) } # transpose approach: transform.new &lt;- function(df) { zz &lt;- t(df) times &lt;- t(zz[1,4:ncol(zz)]) colnames(zz) &lt;- c("country","city","type", times) data &lt;- melt(zz[-1,-(1:3)],varnames=c("id","time")) final &lt;- cbind(country=rep(zz[-1,1],each=ncol(zz)-3), city =rep(zz[-1,2],each=ncol(zz)-3), type =rep(zz[-1,3],each=ncol(zz)-3), data[,-1]) return(final) } # benchmark microbenchmark(transform.orig(df),transform.new(df), times=5, unit="s") Unit: seconds expr min lq median uq max neval transform.orig(df) 9.2511679 9.6986330 9.889457 10.1518191 10.3354328 5 transform.new(df) 0.4383197 0.4724145 0.474212 0.5815531 0.6886383 5 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload