Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You need to normalize the groups. In each group, pick one word or coding that represents the group. Then group the words by their representative.</p> <p>Some possible ways:</p> <ul> <li>Pick the first encountered word.</li> <li>Pick the lexicographic first word.</li> <li>Derive a pattern for all the words.</li> <li>Pick an unique index.</li> <li>Use the <a href="http://en.wikipedia.org/wiki/Soundex" rel="nofollow" title="WP: Soundex">soundex</a> as pattern.</li> </ul> <p>Grouping the words could be difficult, though. If A is similar to B, and B is similar to C, A and C is not necessarily similar to each other. If B is the representative, both A and C could be included in the group. But if A or C is the representative, the other could not be included.</p> <hr> <p>Going by the first alternative (first encountered word):</p> <pre><code>class Seeder: def __init__(self): self.seeds = set() self.cache = dict() def get_seed(self, word): LIMIT = 2 seed = self.cache.get(word,None) if seed is not None: return seed for seed in self.seeds: if self.distance(seed, word) &lt;= LIMIT: self.cache[word] = seed return seed self.seeds.add(word) self.cache[word] = word return word def distance(self, s1, s2): l1 = len(s1) l2 = len(s2) matrix = [range(zz,zz + l1 + 1) for zz in xrange(l2 + 1)] for zz in xrange(0,l2): for sz in xrange(0,l1): if s1[sz] == s2[zz]: matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1, matrix[zz][sz+1] + 1, matrix[zz][sz]) else: matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1, matrix[zz][sz+1] + 1, matrix[zz][sz] + 1) return matrix[l2][l1] import itertools def group_similar(words): seeder = Seeder() words = sorted(words, key=seeder.get_seed) groups = itertools.groupby(words, key=seeder.get_seed) return [list(v) for k,v in groups] </code></pre> <p><strong>Example:</strong></p> <pre><code>import pprint print pprint.pprint(group_similar([ 'the', 'be', 'to', 'of', 'and', 'a', 'in', 'that', 'have', 'I', 'it', 'for', 'not', 'on', 'with', 'he', 'as', 'you', 'do', 'at', 'this', 'but', 'his', 'by', 'from', 'they', 'we', 'say', 'her', 'she', 'or', 'an', 'will', 'my', 'one', 'all', 'would', 'there', 'their', 'what', 'so', 'up', 'out', 'if', 'about', 'who', 'get', 'which', 'go', 'me', 'when', 'make', 'can', 'like', 'time', 'no', 'just', 'him', 'know', 'take', 'people', 'into', 'year', 'your', 'good', 'some', 'could', 'them', 'see', 'other', 'than', 'then', 'now', 'look', 'only', 'come', 'its', 'over', 'think', 'also', 'back', 'after', 'use', 'two', 'how', 'our', 'work', 'first', 'well', 'way', 'even', 'new', 'want', 'because', 'any', 'these', 'give', 'day', 'most', 'us' ]), width=120) </code></pre> <p><strong>Output:</strong></p> <pre><code>[['after'], ['also'], ['and', 'a', 'in', 'on', 'as', 'at', 'an', 'one', 'all', 'can', 'no', 'want', 'any'], ['back'], ['because'], ['but', 'about', 'get', 'just'], ['first'], ['from'], ['good', 'look'], ['have', 'make', 'give'], ['his', 'her', 'if', 'him', 'its', 'how', 'us'], ['into'], ['know', 'new'], ['like', 'time', 'take'], ['most'], ['of', 'I', 'it', 'for', 'not', 'he', 'you', 'do', 'by', 'we', 'or', 'my', 'so', 'up', 'out', 'go', 'me', 'now'], ['only'], ['over', 'our', 'even'], ['people'], ['say', 'she', 'way', 'day'], ['some', 'see', 'come'], ['the', 'be', 'to', 'that', 'this', 'they', 'there', 'their', 'them', 'other', 'then', 'use', 'two', 'these'], ['think'], ['well'], ['what', 'who', 'when', 'than'], ['with', 'will', 'which'], ['work'], ['would', 'could'], ['year', 'your']] </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload