Note that there are some explanatory texts on larger screens.

plurals
  1. POWhat is the fastest way to check if a string contains repeating characters in Python 3?
    primarykey
    data
    text
    <p>I need to filter strings by the criterion that they <strong>contain no character twice</strong>.</p> <ul> <li>The strings are <strong>many</strong> (say 1.4 trillion).</li> <li>The strings are <strong>short</strong> (around 8 characters).</li> <li>The strings are <strong>unique</strong> (caching won't work).</li> <li>The strings have a <strong>big character set</strong> (say any Unicode character).</li> <li>The strings <strong>usually meet the criterion</strong> (say 2/3 have no repeating characters).</li> </ul> <p>The using code would look like this:</p> <pre><code>&gt;&gt;&gt; candidate_strings = ["foobnehg", "barfnehg", "bazfnehg"] &gt;&gt;&gt; result_strings = [s if unique_chars(s) for s in candidate_strings] &gt;&gt;&gt; print(result_strings) ["barfnehg", "bazfnehg"] </code></pre> <p>I implemented a naive version, simply iterating the string:</p> <pre><code>def unique_chars_naive(string_given): """ Checks if a given string contains only unique characters. This version iterates the given string, saving all occurred characters. """ chars_seen = [] for char in string_given: if char in chars_seen: return False chars_seen.append(char) return True </code></pre> <p>My next-best idea was to use a <code>set</code>, so I implemented that:</p> <pre><code>def unique_chars_set(string_given): """ Checks if a given string contains only unique characters. This version exploits that a set contains only unique entries. """ return len(string_given) == len(set(string_given)) </code></pre> <p>Saving the functions to a file <code>UniqueCharacters.py</code>, timed them:</p> <pre><code>$ python3 -m timeit -n 100000 --setup='import UniqueCharacters; candidate_strings = ["foobnehg", "barfnehg", "bazfnehg"]' '[UniqueCharacters.unique_chars_naive(s) for s in candidate_strings]' 100000 loops, best of 3: 20.3 usec per loop $ python3 -m timeit -n 100000 --setup='import UniqueCharacters; candidate_strings = ["foobnehg", "barfnehg", "bazfnehg"]' '[UniqueCharacters.unique_chars_set(s) for s in candidate_strings]' 100000 loops, best of 3: 17.7 usec per loop </code></pre> <p>This shows that the <code>unique_chars_set</code> is faster by about 15 % for this dataset.</p> <p>Is there a faster way to do this? With regular expressions maybe? Is there some method in the standard library that does this?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload