StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Is there a reason you care about the worst-case performance instead of the average performance? Any reasonable hashtable will give you the average performance of O(N).</p> <p>If you really want worst-case performance of O(1), here are two possible approaches:</p> <ol> <li><p>Have a vector of <code>max(charCode)-min(charCode)</code> entries and directly look up the value you want from the unicode character code. This will work well if your keys fall in a compact-enough range that you can fit it in RAM.</p></li> <li><p>Use a brute-force approach to choose hash functions or dictionary sizes (using a custom implementation of a dictionary that lets you control this), and keep trying new functions and/or sizes till you get one with no collisions. Expect this to take a very long time. <em>I do not recommend this.</em></p></li> </ol> <p><strong>EDIT:</strong></p> <p>Suppose that you known that the minimum character code you'll see is 1234 and the maximum you'll see is 98765. Further suppose that you have enough RAM to hold 98765-1234 elements. I'll also assume you're willing to use the <code>numpy</code> library or some other efficient array implementation. In that case, you can store the values in the vector like this:</p> <pre><code># configuration info max_value = 98765 # replace with your number min_value = 1234 # replace with your number spread = (max_value - min_value) dtype = object # replace with a primitive type if you want to store something simpler # create the big vector my_data = numpy.empty((spread,), dtype=dtype) # insert elements my_char_code = ... my_value_for_my_char_code = ... assert min_value <= my_char_code < max_value my_data[my_char_code - min_value] = my_value_for_my_char_code # extract elements my_char_code = ... assert min_value <= my_char_code < max_value my_value_for_my_char_code = my_data[my_char_code - min_value] </code></pre> <p>This is O(1) because the lookup is implemented using pointer arithmetic and there's no dependence on the number of elements stored in the array.</p> <p>This approach can be extremely wasteful of RAM if the number of elements you actually want to store is much smaller than <code>spread</code>. For example, if <code>spread</code> is 4 billion (all of UTF32) then <code>my_data</code> alone will consume at least 4 billion * 8 bytes / pointer = 32 GB of RAM (and probably a lot more; I don't know how big Python references are). On the other hand, if <code>min_value</code> is 3 billion and <code>max_value = min_value + 100</code>, then the memory usage will be tiny.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload