Note that there are some explanatory texts on larger screens.

plurals
  1. POff data storage running big analysis
    primarykey
    data
    text
    <p>I've spent hours reading for using ff package and couldn't get a handle on this topic yet. Basically, I'd like to run a analysis on a big data and save the results/statistics from the analysis.</p> <p>I modified the example code written in ff package using biglm on my data set. <a href="http://cran.r-project.org/web/packages/ff/ff.pdf" rel="nofollow noreferrer">http://cran.r-project.org/web/packages/ff/ff.pdf</a> The problem is very similar to this one <a href="https://stackoverflow.com/questions/17295423/modeling-a-very-big-data-set-1-8-million-rows-x-270-columns-in-r?rq=1">Modeling a very big data set (1.8 Million rows x 270 Columns) in R</a></p> <p>Here's my code below</p> <pre><code>library(ff) library(ffbase) library(doSNOW) registerDoSNOW(makeCluster(4, type = "SOCK")) memory.limit(size=32000) setwd('Z:/data') wd &lt;- getwd() data.path &lt;- file.path(wd,'ffdb') data.path.train &lt;- file.path(data.path,'train') ff.train &lt;- read.table.ffdf(file='train.tsv', sep='\t') save.ffdf(ff.train, dir=data.path.train) library(biglm) # Here I'm implementing biglm model on ffdf data # Vi represents the column names form &lt;- V27 ~ V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15 ff.biglm &lt;- for (i in chunk(ff.train, by=500)){ if (i[1]==1){ message("first chunk is: ", i[[1]],":",i[[2]]) biglmfit &lt;- biglm(form, data=ff.train[i,,drop=FALSE]) }else{ message("next chunk is: ", i[[1]],":",i[[2]]) biglmfit &lt;- update(biglmfit, ff.train[i,,drop=FALSE]) } } </code></pre> <p>When the above code is ran, it gives the following error message:</p> <blockquote> <p>first chunk is: 1:494 Error: cannot allocate vector of size 19.4 Gb In addition: There were 50 or more warnings (use warnings() to see the first 50)</p> </blockquote> <p>Is this error message in regards to the size of biglmfit cannot be fitting to memory? Any work around to save biglmfit into ffdf data type? Or for that matter, is there any ways to store analysis statistics into ffdf type in chunk? Thank you.</p> <p>EDIT:</p> <pre><code>vmode(ff.train) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" V21 V22 V23 V24 V25 V26 V27 "integer" "integer" "integer" "integer" "integer" "integer" "integer" </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload