Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimize algorithm for creating a list of items rated together, in Python
    primarykey
    data
    text
    <p>given a list of purchase events (customer_id,item)</p> <pre><code>1-hammer 1-screwdriver 1-nails 2-hammer 2-nails 3-screws 3-screwdriver 4-nails 4-screws </code></pre> <p>i'm trying to build a data structure that tells how many times an item was bought with another item. Not bought at the same time, but bought since I started saving data. the result would look like</p> <pre><code>{ hammer : {screwdriver : 1, nails : 2}, screwdriver : {hammer : 1, screws : 1, nails : 1}, screws : {screwdriver : 1, nails : 1}, nails : {hammer : 1, screws : 1, screwdriver : 1} } </code></pre> <p>indicating That a hammer was bought with nails twice (persons 1,3) and a screwdriver once (person 1), screws were bought with a screwdriver once (person 3), and so on...</p> <p>my current approach is</p> <p>users = dict where userid is the key and a list of items bought is the value<p> usersForItem = dict where itemid is the key and list of users who bought item is the value<p> userlist = temporary list of users who have rated the current item<p></p> <pre><code>pseudo: for each event(customer,item)(sorted by item): add user to users dict if not exists, and add the items add item to items dict if not exists, and add the user ---------- for item,user in rows: # add the user to the users dict if they don't already exist. users[user]=users.get(user,[]) # append the current item_id to the list of items rated by the current user users[user].append(item) if item != last_item: # we just started a new item which means we just finished processing an item # write the userlist for the last item to the usersForItem dictionary. if last_item != None: usersForItem[last_item]=userlist userlist=[user] last_item = item items.append(item) else: userlist.append(user) usersForItem[last_item]=userlist </code></pre> <p>So, at this point, I have 2 dicts - who bought what, and what was bought by whom. Here's where it gets tricky. Now that usersForItem is populated, I loop through it, loop through each user who bought the item, and look at the users' other purchases. I acknowledge that this is not the most pythonic way of doing things - I'm trying to make sure I get the correct result(which I am) before getting fancy with the Python.</p> <pre><code>relatedItems = {} for key,listOfUsers in usersForItem.iteritems(): relatedItems[key]={} related=[] for ux in listOfReaders: for itemRead in users[ux]: if itemRead != key: if itemRead not in related: related.append(itemRead) relatedItems[key][itemRead]= relatedItems[key].get(itemRead,0) + 1 calc jaccard/tanimoto similarity between relatedItems[key] and its values </code></pre> <p>Is there a more efficient way that I can be doing this? Additionally, if there is a proper academic name for this type of operation, I'd love to hear it.</p> <p>edit: clarified to include the fact that I'm not restricting purchases to items bought together at the same time. Items can be bought at any time. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload