StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
1512697
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2009-10-03T03:18:29.170
FavoriteCount
0
LastActivityDate
2009-10-03T03:18:29.170
LastEditDate
LastEditorUserId
0
OwnerUserId
97160
ParentId
1512624
PostTypeId
2
Score
10
ViewCount
0
LastEditorDisplayName
text
Body
Let me explain the procedure that the authors introduced (as I understood it): Input: <ul> <li>Training data: users, items, and ratings of users to these items (not necessarily each user rated all items)</li> <li>Target user: a new user with some ratings of some items</li> <li>Target item: an item not rated by target user that we would like to predict a rating for it.</li> </ul> Output: <ul> <li>prediction for the target item by target user</li> </ul> This can be repeated for a bunch of items, and then we return the N-top items (highest predicted ratings) Procedure: The algorithm is very similar to the naive <a href="http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm" rel="noreferrer">KNN</a> method (search all training data to find users with similar ratings to the target user, then combine their ratings to give prediction [voting]). This simple method does not scale very well, as the number of users/items increase. The algorithm proposed is to first cluster the training users into K groups (groups of people who rated items similarly), where K << N (N is the total number of users). Then we scan those clusters to find which one the target user is closest to (instead of looking at all the training users). Finally we pick l out of those and we make our prediction as an average weighted by the distance to those l clusters. Note that the similarity measure used is the <a href="http://en.wikipedia.org/wiki/Correlation" rel="noreferrer">correlation</a> coefficient, and the clustering algorithm is the bisecting K-Means algorithm. We can simply use the standard <a href="http://en.wikipedia.org/wiki/K-means_clustering" rel="noreferrer">kmeans</a>, and we can use other similarity metrics as well such as <a href="http://en.wikipedia.org/wiki/Euclidean_distance" rel="noreferrer">Euclidean distance</a> or cosine distance. The first formula on page 5 is the definition of the correlation: <pre><code>corr(x,y) = (x-mean(x))(y-mean(y)) / std(x)*std(y) </code></pre> The second formula is basically a weighted average: <pre><code>predRating = sum_i(rating_i * corr(target,user_i)) / sum(corr(target,user_i)) where i loops over the selected top-l clusters </code></pre> Hope this clarifies things a little bit :)
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POBuilding a Collaborative filtering / Recommendation System
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USAmro
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POBuilding a Collaborative filtering / Recommendation System
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COThanks, it's all still greek to me. One day I'll come back and it'll all make sense. :)
 singulars
 PostPostId
 PO
 UserUserId
 USJohn

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.