Note that there are some explanatory texts on larger screens.

plurals
  1. POMap Reduce Linear Regression in base R
    primarykey
    data
    text
    <p>I'm working on a distributed linear regression calculation in R for Hadoop, but before implementing it, I'd like to verify that my calculations agree with the results of the <code>lm</code> function.</p> <p>I have the following functions which attempt to implement the generic "summation" framework discussed by Andrew Ng et al. in the paper <a href="http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf" rel="nofollow">Map-Reduce for Machine Learning on Multicore</a>.</p> <p>For linear regression, this involves mapping each row y_i and x_i to P_i and Q_i such that:</p> <pre><code>P_i = x_i * transpose(x_i) Q_i = x_i * y_i </code></pre> <p>Then reducing to solve for the coefficients, theta: <code>theta = (sum(P_i))^-1 * sum(Q_i)</code></p> <p>The R functions to do this are:</p> <pre><code>calculate_p &lt;- function(dat_row) { dat_row %*% t(dat_row) } calculate_q &lt;- function(dat_row) { dat_row[1,1] * dat_row[, -1] } calculate_pq &lt;- function(dat_row) { c(calculate_p(matrix(dat_row[-1], nrow=1)), calculate_q(matrix(dat_row, nrow=1))) } map_pq &lt;- function(dat) { t(apply(dat, 1, calculate_pq)) } reduce_pq &lt;- function(pq) { (1 / sum(pq[, 1])) * apply(pq[, -1], 2, sum) } </code></pre> <p>You can implement it on some synthetic data by running:</p> <pre><code>X &lt;- matrix(rnorm(20*5), ncol = 5) y &lt;- as.matrix(rnorm(20)) reduce_pq(map_pq(cbind(y, X))) [1] 0.010755882 -0.006339951 -0.034797768 0.067438662 -0.033557351 coef(lm.fit(X, y)) x1 x2 x3 x4 x5 -0.038556283 -0.002963991 -0.195897701 0.422552974 -0.029823962 </code></pre> <p>Unfortunately, the outputs don't match, so obviously I'm doing something wrong. Any ideas how I can fix it?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload