StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSubset of a matrix multiplication, fast, and sparse
primarykey
Id
18792096
data
AcceptedAnswerId
18797443
AnswerCount
3
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2013-09-13T17:34:06.640
FavoriteCount
1
LastActivityDate
2013-09-14T01:47:42.323
LastEditDate
2017-05-23T12:32:02.037
LastEditorUserId
-1
OwnerUserId
2591541
ParentId
0
PostTypeId
1
Score
4
ViewCount
854
LastEditorDisplayName
text
Body
Converting a collaborative filtering code to use sparse matrices I'm puzzling on the following problem: given two full matrices X (m by l) and Theta (n by l), and a sparse matrix R (m by n), is there a fast way to calculate the sparse inner product . Large dimensions are m and n (order 100000), while l is small (order 10). This is probably a fairly common operation for big data since it shows up in the cost function of most linear regression problems, so I'd expect a solution built into scipy.sparse, but I haven't found anything obvious yet. The naive way to do this in python is R.multiply(XTheta.T), but this will result in evaluation of the full matrix XTheta.T (m by n, order 100000**2) which occupies too much memory, then dumping most of the entries since R is sparse. There is a <a href="https://stackoverflow.com/questions/13731405/calculate-subset-of-matrix-multiplication">pseudo solution already here on stackoverflow</a>, but it is non-sparse in one step: <pre><code>def sparse_mult_notreally(a, b, coords): rows, cols = coords rows, r_idx = np.unique(rows, return_inverse=True) cols, c_idx = np.unique(cols, return_inverse=True) C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is dense return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) ) </code></pre> This works fine, and fast, for me on small enough arrays, but it barfs on my big datasets with the following error: <pre><code>... in sparse_mult(a, b, coords) 132 rows, r_idx = np.unique(rows, return_inverse=True) 133 cols, c_idx = np.unique(cols, return_inverse=True) --> 134 C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is not sparse 135 return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) ) ValueError: array is too big. </code></pre> A solution which IS actually sparse, but very slow, is: <pre><code>def sparse_mult(a, b, coords): rows, cols = coords n = len(rows) C = np.array([ float(a[rows[i],:]*b[:,cols[i]]) for i in range(n) ]) # this is sparse, but VERY slow return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) ) </code></pre> Does anyone know a fast, fully sparse way to do this?
Tags
<python><numpy><scipy><sparse-matrix>
Title
Subset of a matrix multiplication, fast, and sparse
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USAlexander Tronchin-James
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSubset of a matrix multiplication, fast, and sparse
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POSubset of a matrix multiplication, fast, and sparse
 UserUserId
 USAlexander Tronchin-James
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POSubset of a matrix multiplication, fast, and sparse
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.