Note that there are some explanatory texts on larger screens.

plurals
  1. POSubset of a matrix multiplication, fast, and sparse
    text
    copied!<p>Converting a collaborative filtering code to use sparse matrices I'm puzzling on the following problem: given two full matrices X (m by l) and Theta (n by l), and a sparse matrix R (m by n), is there a fast way to calculate the sparse inner product . Large dimensions are m and n (order 100000), while l is small (order 10). This is probably a fairly common operation for big data since it shows up in the cost function of most linear regression problems, so I'd expect a solution built into scipy.sparse, but I haven't found anything obvious yet.</p> <p>The naive way to do this in python is R.multiply(X<em>Theta.T), but this will result in evaluation of the full matrix X</em>Theta.T (m by n, order 100000**2) which occupies too much memory, then dumping most of the entries since R is sparse.</p> <p>There is a <a href="https://stackoverflow.com/questions/13731405/calculate-subset-of-matrix-multiplication">pseudo solution already here on stackoverflow</a>, but it is non-sparse in one step:</p> <pre><code>def sparse_mult_notreally(a, b, coords): rows, cols = coords rows, r_idx = np.unique(rows, return_inverse=True) cols, c_idx = np.unique(cols, return_inverse=True) C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is dense return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) ) </code></pre> <p>This works fine, and fast, for me on small enough arrays, but it barfs on my big datasets with the following error:</p> <pre><code>... in sparse_mult(a, b, coords) 132 rows, r_idx = np.unique(rows, return_inverse=True) 133 cols, c_idx = np.unique(cols, return_inverse=True) --&gt; 134 C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is not sparse 135 return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) ) ValueError: array is too big. </code></pre> <p>A solution which IS actually sparse, but very slow, is:</p> <pre><code>def sparse_mult(a, b, coords): rows, cols = coords n = len(rows) C = np.array([ float(a[rows[i],:]*b[:,cols[i]]) for i in range(n) ]) # this is sparse, but VERY slow return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) ) </code></pre> <p>Does anyone know a fast, fully sparse way to do this?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload