Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Edit: maybe simple mapping to numbers could be faster and without collisions:</p> <pre><code>import hashlib from numpy import array features = array(['oklahoma', 'florida', 'idaho', 'pennsylvania', 'alabama','washington'], dtype=object) numbers = range(0, len(features)) num2string = dict(zip(numbers, features)) string2num = dict(zip(features, numbers)) # read the result for i in num2string: print "%i =&gt; '%s'" % (i, num2string[i]) print "usage test:" print string2num['oklahoma'] print num2string[string2num['oklahoma']] </code></pre> <p>You will get a simple sequence of numbers for every item in your array:</p> <pre><code>0 =&gt; 'oklahoma' 1 =&gt; 'florida' 2 =&gt; 'idaho' </code></pre> <p>Advantage: simplicity and speed Disadvantage: You'll get different numbers for the same string if you change it's position in array, unlike with hashing the strings.</p> <p><strong>Usage of hashing</strong></p> <p>You can hash the string using some well chosen hask algorithm. You have to be careful about number of collisions for your hash function. If two data have the same hash, you would have like a duplicit number in your input. In this example, md5 hash function is used for the purpose:</p> <pre><code>import hashlib from numpy import array def string_to_num(s): return int(hashlib.md5(s).hexdigest(), 16) features = array(['oklahoma', 'florida', 'idaho', 'pennsylvania', 'alabama','washington'], dtype=object) # hash those strings features_string_for_number = {} for i in features: hash_number = string_to_num(i) features_string_for_number[hash_number]=i # read the result for i in features_string_for_number: print "%i =&gt; '%s'" % (i, features_string_for_number[i]) print "usage test:" print string_to_num('oklahoma') print features_string_for_number[string_to_num('oklahoma')] </code></pre> <p>The hashing part is taken from <a href="https://stackoverflow.com/questions/2511058/persistent-hashing-of-strings-in-python">here</a>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload