Note that there are some explanatory texts on larger screens.

plurals
  1. POModeling a very big data set (1.8 Million rows x 270 Columns) in R
    primarykey
    data
    text
    <p>I am working on a <em>Windows 8</em> OS with a <em>RAM of 8 GB</em> . I have a data.frame of <strong>1.8 million rows x 270 columns</strong> on which I have to perform a glm. (logit/any other classification)</p> <p>I've tried using ff and bigglm packages for handling the data.</p> <p>But I am still facing a problem with the error "<code>Error: cannot allocate vector of size 81.5 Gb</code>". So, I decreased the number of rows to 10 and tried the steps for bigglm on an object of class ffdf. However the error still is persisting.</p> <p>Can any one suggest me the solution of this problem of building a classification model with these many rows and columns?</p> <p><code>**EDITS**</code>:</p> <p>I am <strong>not</strong> using any other program when I am running the code. The RAM on the system is 60% free before I run the code and that is because of the R program. When I terminate R, the RAM 80% free.</p> <p>I am adding <strong>some of the columns</strong> which I am working with now as suggested by the commenters for reproduction. <strong>OPEN_FLG is the DV</strong> and others are IDVs</p> <pre><code>str(x[1:10,]) 'data.frame': 10 obs. of 270 variables: $ OPEN_FLG : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 $ new_list_id : Factor w/ 9 levels "0","3","5","6",..: 1 1 1 1 1 1 1 1 1 1 $ new_mailing_id : Factor w/ 85 levels "1398","1407",..: 1 1 1 1 1 1 1 1 1 1 $ NUM_OF_ADULTS_IN_HHLD : num 3 2 6 3 3 3 3 6 4 4 $ NUMBER_OF_CHLDRN_18_OR_LESS: Factor w/ 9 levels "","0","1","2",..: 2 2 4 7 3 5 3 4 2 5 $ OCCUP_DETAIL : Factor w/ 49 levels "","00","01","02",..: 2 2 2 2 2 2 2 21 2 2 $ OCCUP_MIX_PCT : num 0 0 0 0 0 0 0 0 0 0 $ PCT_CHLDRN : int 28 37 32 23 36 18 40 22 45 21 $ PCT_DEROG_TRADES : num 41.9 38 62.8 2.9 16.9 ... $ PCT_HOUSEHOLDS_BLACK : int 6 71 2 1 0 4 3 61 0 13 $ PCT_OWNER_OCCUPIED : int 91 66 63 38 86 16 79 19 93 22 $ PCT_RENTER_OCCUPIED : int 8 34 36 61 14 83 20 80 7 77 $ PCT_TRADES_NOT_DEROG : num 53.7 55 22.2 92.3 75.9 ... $ PCT_WHITE : int 69 28 94 84 96 79 91 29 97 79 $ POSTAL_CD : Factor w/ 104568 levels "010011203","010011630",..: 23789 45173 32818 6260 88326 29954 28846 28998 52062 47577 $ PRES_OF_CHLDRN_0_3 : Factor w/ 4 levels "","N","U","Y": 2 2 3 4 2 4 2 4 2 4 $ PRES_OF_CHLDRN_10_12 : Factor w/ 4 levels "","N","U","Y": 2 2 4 3 3 2 3 2 2 3 [list output truncated] </code></pre> <p>And this is the <strong>example</strong> of code which I am using.</p> <pre><code>require(biglm) mymodel &lt;- bigglm(OPEN_FLG ~ new_list_id+NUM_OF_ADULTS_IN_HHLD+OCCUP_MIX_PCT, data = x) require(ff) x$id &lt;- ffseq_len(nrow(x)) xex &lt;- expand.ffgrid(x$id, ff(1:100)) colnames(xex) &lt;- c("id","explosion.nr") xex &lt;- merge(xex, x, by.x="id", by.y="id", all.x=TRUE, all.y=FALSE) mymodel &lt;- bigglm(OPEN_FLG ~ new_list_id+NUM_OF_ADULTS_IN_HHLD+OCCUP_MIX_PCT, data = xex) </code></pre> <p><strong>The problem is both times I get the same error "<code>Error: cannot allocate vector of size 81.5 Gb</code>".</strong></p> <hr> <p>Please let me know if this is enough or should I include anymore details about the problem.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload