Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This function already exists in the <a href="http://docs.python.org/3/library/itertools.html#recipes" rel="nofollow"><code>itertools</code> recipes</a>, as <code>unique_everseen</code>. You can copy and paste it from there, or read it to see how it works, or install the third-party package <code>more-itertools</code> and use it from there.</p> <p>Here's a simplified version of the code:</p> <pre><code>def unique_everseen(iterable): seen = set() for element in iterable: if element not in seen: seen.add(element) yield element </code></pre> <hr> <p>The version in the recipes allows for a <code>key</code> function, which you don't need, and it has two optimizations. But first understand the simple version:</p> <p><code>seen</code> is a set of all values seen so far. For each value, we check whether it's in <code>seen</code>. If so, we skip it. Otherwise, we add it to the set and <code>yield</code> it. So, we <code>yield</code> each element only the first time it's seen.</p> <hr> <p>The first optimization in the recipe version is simple: looking up the <code>seen.add</code> method isn't quite free, so we do it once instead of N times, by doing <code>seen_add = seen.add</code>. This makes a sizable difference when benchmarking trivial cases, like a list of small integers; it may not make much difference in real use cases with values that are more expensive to hash.</p> <p>The second optimization is to use <code>ifilterfalse</code> instead of an <code>if</code> to skip over the elements that have already been seen. Basically this means that if you have N elements and M unique elements, you only do M iterations in Python and N in the optimized C code inside <code>ifilterfalse</code>, instead of doing N in Python. Since iterating in C is much faster, this is worth it unless almost all of your elements are unique.</p> <hr> <p>To make it work with a <code>key</code> function, all you have to do is keep a set of <code>key(element)</code> values seen so far, instead of <code>element</code> values seen so far. This makes the <code>ifilterfalse</code> optimization a little harder to do and much less effective, so it isn't done.</p> <hr> <p>If you're only dealing with sequences, not arbitrary iterables, and you can count on Python 2.7+, there's another way to do this which is almost as efficient, and even simpler:</p> <pre><code>def unique(a): return OrderedDict.fromkeys(a).keys() </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload