Note that there are some explanatory texts on larger screens.

plurals
  1. POElegantly convert rate summary rows into long binary-response rows?
    primarykey
    data
    text
    <p>Background: I am running a little A/B test, with 2x2 factors (foreground's black and background's white, off-color vs normal color), and <a href="http://i.imgur.com/y5WckXe.png" rel="nofollow">Analytics reports</a> the number of hits for each of the 4 conditions and at what rate they 'converted' (a binary variable, which I define as spending at least 40 seconds on page). It's easy enough to do a little editing and get in a nice R dataframe:</p> <pre><code>rates &lt;- read.csv(stdin(),header=TRUE) Black,White,N,Rate TRUE,FALSE,512,0.2344 FALSE,TRUE,529,0.2098 TRUE,TRUE,495,0.1919 FALSE,FALSE,510,0.1882 </code></pre> <p>Naturally, I'd like to look at a logistic regression on something like <code>Rate ~ Black * White</code> but R's <code>glm</code> wants a dataframe of 2046 rows each reporting a <code>TRUE</code> or <code>FALSE</code> conversion value &amp; the values of <code>Black</code> and <code>White</code>. This... is a little more tricky. I googled around and checked SO but while I found some clunky code on how to convert a table of contingency counts to a dataframe, I didn't find anything about <em>percentages/rates</em>.</p> <p>After a lot of trouble, I came up with a loop over the 4 conditions in which I repeat a dataframe <code>rate * n</code> times with the relevant condition values and the result <code>True</code> and then do the same thing but for <code>(1 - rate) * n</code> and the result <code>False</code>, and then stitch together all 8 dataframes into one giant dataframe:</p> <pre><code>ground &lt;- NULL for (i in 1:nrow(rates)) { x &lt;- rates[i,] y &lt;- do.call("rbind", replicate((x$N * x$Rate), data.frame(Black=c(x$Black),White=c(x$White),Conversion=c(TRUE)), simplify = FALSE)) z &lt;- do.call("rbind", replicate((x$N * (1-x$Rate)), data.frame(Black=c(x$Black),White=c(x$White),Conversion=c(FALSE)), simplify = FALSE)) ground &lt;- rbind(ground,y,z) } </code></pre> <p>The resulting dataframe <code>ground</code> looks right:</p> <pre><code>sum(rates$N) [1] 2046 nrow(ground) [1] 2042 # the missing 4 are probably from the rounding-off of the reported conversion rate summary(ground); head(ground, n=20) Black White Conversion Mode :logical Mode :logical Mode :logical FALSE:1037 FALSE:1020 FALSE:1623 TRUE :1005 TRUE :1022 TRUE :419 NA's :0 NA's :0 NA's :0 Black White Conversion 1 TRUE FALSE TRUE 2 TRUE FALSE TRUE 3 TRUE FALSE TRUE 4 TRUE FALSE TRUE 5 TRUE FALSE TRUE 6 TRUE FALSE TRUE 7 TRUE FALSE TRUE 8 TRUE FALSE TRUE 9 TRUE FALSE TRUE 10 TRUE FALSE TRUE 11 TRUE FALSE TRUE 12 TRUE FALSE TRUE 13 TRUE FALSE TRUE 14 TRUE FALSE TRUE 15 TRUE FALSE TRUE 16 TRUE FALSE TRUE 17 TRUE FALSE TRUE 18 TRUE FALSE TRUE 19 TRUE FALSE TRUE 20 TRUE FALSE TRUE </code></pre> <p>And likewise, the logistic regression spits out a sane-looking answer:</p> <pre><code>g &lt;- glm(Conversion ~ Black*White, family=binomial, data=ground); summary(g) ... Deviance Residuals: Min 1Q Median 3Q Max -0.732 -0.683 -0.650 -0.643 1.832 Coefficients: Estimate Std. Error z value Pr(&gt;|z|) (Intercept) -1.472 0.114 -12.94 &lt;2e-16 BlackTRUE 0.291 0.154 1.88 0.060 WhiteTRUE 0.137 0.156 0.88 0.381 BlackTRUE:WhiteTRUE -0.404 0.220 -1.84 0.066 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2072.7 on 2041 degrees of freedom Residual deviance: 2068.2 on 2038 degrees of freedom AIC: 2076 Number of Fisher Scoring iterations: 4 </code></pre> <p>So my question is: is there any more elegant way of turning my Analytics's rate data into <code>glm</code> input than that awful loop?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload