StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>It looks like <strong>clustering</strong> on top of <strong>associating mining</strong>, more precisely <a href="http://en.wikipedia.org/wiki/Apriori_algorithm" rel="nofollow">Apriori</a> algorithm. Something like this: </p> <ol> <li>Mine all possible associations between actions, i.e. sequences Bush -> Prep Breakfast, Prep Breakfast -> Eat Breakfast, ..., Bush -> Prep Breakfast -> Eat Breakfast, etc. Every pair, triplet, quadruple, etc. you can find in your data. </li> <li>Make separate attribute from each such sequence. For better performance add boost of 2 for pair attributes, 3 for triplets and so on. </li> <li>At this moment you must have an attribute vector with corresponding boost vector. You can calculate feature vector for each user: set 1 * boost at each position in the vector if this sequence exists in user actions and 0 otherwise). You will get vector representation of each user. </li> <li>On this vectors use clustering algorithm that fits your needs better. Each found class is the group you use. </li> </ol> <p><strong>Example:</strong> </p> <p>Let's mark all actions as letters: </p> <p>a - Brush<br> b - Prep Breakfast<br> c - East Breakfast<br> d - Take Bath<br> ... </p> <p>Your <em>attributes</em> will look like</p> <p>a1: a->b<br> a2: a->c<br> a3: a->d<br> ...<br> a10: b->a<br> a11: b->c<br> a12: b->d<br> ...<br> a30: a->b->c->d<br> a31: a->b->d->c<br> ... </p> <p>User <em>feature vectors</em> in this case will be: </p> <pre><code>attributes = a1, a2, a3, a4, ..., a10, a11, a12, ..., a30, a31, ... user1 = 1, 0, 0, 0, ..., 0, 1, 0, ..., 4, 0, ... user2 = 1, 0, 0, 0, ..., 0, 1, 0, ..., 4, 0, ... user3 = 0, 0, 0, 0, ..., 0, 0, 0, ..., 0, 0, ... </code></pre> <p>To compare 2 users some distance measure is needed. The simplest one is <a href="http://en.wikipedia.org/wiki/Cosine_similarity" rel="nofollow">cosine distance</a>, that is just value of cosine between 2 feature vectors. If 2 users have exactly the same sequence of actions, their similarity will equal 1. If they have nothing common - their similarity will be 0. </p> <p>With distance measure use clustering algorithm (say, <a href="http://en.wikipedia.org/wiki/K-means_clustering" rel="nofollow">k-means</a>) to make groups of users. </p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload