StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POOptimize algorithm for creating a list of items rated together, in Python
primarykey
Id
3109755
data
AcceptedAnswerId
3111875
AnswerCount
4
ClosedDate
CommentCount
7
CommunityOwnedDate
CreationDate
2010-06-24T12:09:24.357
FavoriteCount
0
LastActivityDate
2010-07-04T23:39:41.897
LastEditDate
2010-06-24T15:29:41.513
LastEditorUserId
92287
OwnerUserId
92287
ParentId
0
PostTypeId
1
Score
3
ViewCount
473
LastEditorDisplayName
text
Body
given a list of purchase events (customer_id,item) <pre><code>1-hammer 1-screwdriver 1-nails 2-hammer 2-nails 3-screws 3-screwdriver 4-nails 4-screws </code></pre> i'm trying to build a data structure that tells how many times an item was bought with another item. Not bought at the same time, but bought since I started saving data. the result would look like <pre><code>{ hammer : {screwdriver : 1, nails : 2}, screwdriver : {hammer : 1, screws : 1, nails : 1}, screws : {screwdriver : 1, nails : 1}, nails : {hammer : 1, screws : 1, screwdriver : 1} } </code></pre> indicating That a hammer was bought with nails twice (persons 1,3) and a screwdriver once (person 1), screws were bought with a screwdriver once (person 3), and so on... my current approach is users = dict where userid is the key and a list of items bought is the value usersForItem = dict where itemid is the key and list of users who bought item is the value userlist = temporary list of users who have rated the current item <pre><code>pseudo: for each event(customer,item)(sorted by item): add user to users dict if not exists, and add the items add item to items dict if not exists, and add the user ---------- for item,user in rows: # add the user to the users dict if they don't already exist. users[user]=users.get(user,[]) # append the current item_id to the list of items rated by the current user users[user].append(item) if item != last_item: # we just started a new item which means we just finished processing an item # write the userlist for the last item to the usersForItem dictionary. if last_item != None: usersForItem[last_item]=userlist userlist=[user] last_item = item items.append(item) else: userlist.append(user) usersForItem[last_item]=userlist </code></pre> So, at this point, I have 2 dicts - who bought what, and what was bought by whom. Here's where it gets tricky. Now that usersForItem is populated, I loop through it, loop through each user who bought the item, and look at the users' other purchases. I acknowledge that this is not the most pythonic way of doing things - I'm trying to make sure I get the correct result(which I am) before getting fancy with the Python. <pre><code>relatedItems = {} for key,listOfUsers in usersForItem.iteritems(): relatedItems[key]={} related=[] for ux in listOfReaders: for itemRead in users[ux]: if itemRead != key: if itemRead not in related: related.append(itemRead) relatedItems[key][itemRead]= relatedItems[key].get(itemRead,0) + 1 calc jaccard/tanimoto similarity between relatedItems[key] and its values </code></pre> Is there a more efficient way that I can be doing this? Additionally, if there is a proper academic name for this type of operation, I'd love to hear it. edit: clarified to include the fact that I'm not restricting purchases to items bought together at the same time. Items can be bought at any time. 
Tags
<python><algorithm><optimization><similarity>
Title
Optimize algorithm for creating a list of items rated together, in Python
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USNeil Kodner
UserOwnerUserId
1. USNeil Kodner
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POOptimize algorithm for creating a list of items rated together, in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POOptimize algorithm for creating a list of items rated together, in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POOptimize algorithm for creating a list of items rated together, in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.