Note that there are some explanatory texts on larger screens.

plurals
  1. POFind unique 'item groups' in multivariate data
    primarykey
    data
    text
    <p>I am trying to isolate the unique <strong>groups of items</strong> in my data - unique groupings of rows associated with a key column, not unique items, which is what most use the unique function for. The question takes some careful reading...so please be kind enough to digest the example first.</p> <p>To be clear, <strong>I do NOT want the unique subset of the group column, nor do I want unique subsets of items, nor even unique combinations of groups and items</strong>. I know these have been covered elsewhere <a href="https://stackoverflow.com/questions/7790732/unique-for-more-than-one-variable">unique() for more than one variable</a>. What I want are <strong>unique sets of items, where sets are defined by groups</strong>.</p> <p>Here is an example</p> <pre><code>set.seed(1234) library(data.table) A &lt;- data.table(group = rep(c("A","B","C","D","E","F"),each = 4), item = c(1, 2, 4, 3, 5, 2, 3, 6, 10, 12, 1, 2, 1, 2, 4, 3, 6, 3, 5, 2, 10, 12, 1, 2), c = runif(8)) A &lt;- A[-23, ] #so we can have an example of unbalanced groups &gt; A group item c 1: A 1 0.15904600 2: A 2 0.03999592 3: A 4 0.21879954 4: A 3 0.81059855 5: B 5 0.52569755 6: B 2 0.91465817 7: B 3 0.83134505 8: B 6 0.04577026 9: C 10 0.15904600 10: C 12 0.03999592 11: C 1 0.21879954 12: C 2 0.81059855 13: D 1 0.52569755 14: D 2 0.91465817 15: D 4 0.83134505 16: D 3 0.04577026 17: E 6 0.15904600 18: E 3 0.03999592 19: E 5 0.21879954 20: E 2 0.81059855 21: F 10 0.52569755 22: F 12 0.91465817 23: F 2 0.04577026 #The unique groups are A:F, and the unique items are 1:6,10,12. #The unique sets of items are: # (set1) 1,2,3,4; (set2) 5,2,3,6; #(set3) 10,2,1,2; (set4) 10,12,2 </code></pre> <p>I want to retrieve these unique sets of items (note again that the item sets are formed by groups). (The third column means little at this time. For fun, I include sums by each 'item'). The output table should look like this: </p> <pre><code>group item c A 1 0.68474355 #note that groups A and D share this same set of items (set1) A 2 0.95465409 A 4 1.05014459# c sums groupAitem4$c with groupDitem4$c A 3 0.85636881 B 5 0.74449709 # group E has the same items (set2), even if not the same order, c is totaled by item. B 2 1.72525672 B 3 0.87134097 B 6 0.20481626 C 10 0.159046 C 12 0.03999592 C 1 0.21879954 C 2 0.81059855 F 10 0.52569755 #Not the same as group C F 12 0.91465817 F 2 0.04577026 </code></pre> <p>I suppose there might be a way of going through reshape that would be quite awkward. My data is large, so efficient procedures like <code>data.table</code> would be very appreciated.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload