Note that there are some explanatory texts on larger screens.

plurals
  1. POCircumventing R's `Error in if (nbins > .Machine$integer.max)`
    primarykey
    data
    text
    <p>This is a saga which began with the problem of <a href="https://stackoverflow.com/questions/5446078/frequency-weighting-in-r-comparing-results-with-stata" title="An initial question on how to do frequency weighting, which should have been probability weighting">how to do survey weighting</a>. Now that I appear to be doing that correctly, I have hit a bit of a wall (see previous post for details on the import process and where the <code>strata</code> variable came from):</p> <pre><code>&gt; require(foreign) &gt; ipums &lt;- read.dta('/path/to/data.dta') &gt; require(survey) &gt; ipums.design &lt;- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt) Error in if (nbins &gt; .Machine$integer.max) stop("attempt to make a table with &gt;= 2^31 elements") : missing value where TRUE/FALSE needed In addition: Warning messages: 1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow 2: In pd * nl : NAs produced by integer overflow &gt; traceback() 9: tabulate(bin, pd) 8: as.vector(data) 7: array(tabulate(bin, pd), dims, dimnames = dn) 6: table(ids[, 1], strata[, 1]) 5: inherits(x, "data.frame") 4: is.data.frame(x) 3: rowSums(table(ids[, 1], strata[, 1]) &gt; 0) 2: svydesign.default(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums) 1: svydesign(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums) </code></pre> <p>This error seems to come from the <a href="http://svn.r-project.org/R/trunk/src/library/base/R/tabulate.R" rel="nofollow noreferrer" title="Source code of the tabulate package, very short"><code>tabulate</code></a> function, which I hoped would be straightforward enough to circumvent, first by changing <code>.Machine$integer.max</code></p> <pre><code>&gt; .Machine$integer.max &lt;- 2^40 </code></pre> <p>and when that didn't work the whole source code of <code>tabulate</code>:</p> <pre><code>&gt; tabulate &lt;- function(bin, nbins = max(1L, bin, na.rm=TRUE)) { if(!is.numeric(bin) &amp;&amp; !is.factor(bin)) stop("'bin' must be numeric or a factor") #if (nbins &gt; .Machine$integer.max) if (nbins &gt; 2^40) #replacement line stop("attempt to make a table with &gt;= 2^31 elements") .C("R_tabulate", as.integer(bin), as.integer(length(bin)), as.integer(nbins), ans = integer(nbins), NAOK = TRUE, PACKAGE="base")$ans } </code></pre> <p>Neither circumvented the problem. Apparently this is one reason why the <code>ff</code> package was created, but what worries me is the extent to which this is a problem I cannot avoid in <code>R</code>. <a href="http://cran.r-project.org/web/packages/ff/NEWS" rel="nofollow noreferrer" title="Changes to the ff package related to .Machine$integer.max">This post</a> seems to indicate that even if I were to use a package that would avoid this problem, I would only be able to access 2^31 elements at a time. My hope was to use <code>sql</code> (either <code>sqlite</code> or <code>postgresql</code>) to get around the memory problems, but I'm afraid I'll spend a while getting that to work, only to run into the same fundamental limit.</p> <p>Attempting to switch back to <code>Stata</code> doesn't solve the problem either. Again see the <a href="https://stackoverflow.com/questions/5446078/frequency-weighting-in-r-comparing-results-with-stata" title="An initial question on how to do frequency weighting, which should have been probability weighting">previous post</a> for how I use <code>svyset</code>, but the calculation I would like to run causes <code>Stata</code> to hang:</p> <pre><code>svy: mean age, over(strata) </code></pre> <p>Whether throwing more memory at it will solve the problem I don't know. I run <code>R</code> on my desktop which has 16 gigs, and I use <code>Stata</code> through a Windows server, currently setting memory allocation to 2000MB, but I could theoretically experiment with increasing that.</p> <p>So in sum:</p> <ol> <li>Is this a hard limit in <code>R</code>?</li> <li>Would <code>sql</code> solve my <code>R</code> problems?</li> <li>If I split it up into many separate files would that fix it (a lot of work...)?</li> <li>Would throwing a lot of memory at <code>Stata</code> do it?</li> <li>Am I seriously barking up the wrong tree somehow?</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload