Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>What you're proposing is sometimes called a look-aside table; a secondary table used for various lookup purposes. In your case, you have a number of different possible ways of organizing this table. The most obvious is to not organize it, and use linear search to see if the next element is already known. Since the table will end up containing some 30000 elements, that's probably not a good idea. From the standard library (at least in C++11), there are two possibilities: <code>std::set</code> and <code>std::unordered_set</code>. <code>std::set</code> uses some form of balanced tree, so makes at most lg&nbsp;<i>n</i> comparisions for each lookup (around 15 for 30000 elements); <code>std::unordered_set</code> is a hash table, and with a good hash function, will require as small constant number of comparisons: you should be able to get it down to under 2 on the average (but possibly at a cost of more memory&mdash;the lower the load factor, the less the probability of a collision). As you mention, you <em>do</em> have the extra cost of calculating the hash function, and as you point out, this does involve visiting each element in the vector; in the binary tree, all that it required in each comparison is that enough elements are compared to determine order&mdash;in many cases, that may be just one or two. (But if you say that there are a lot of duplicates... you cannot detect a duplicate until you've visited all 30 entries, since any one may vary.) The only way to know which solution will actually be faster is to measure both, using typical data; for a data set such as you describe (many duplicates), I suspect the hash table will win, but it's far from certain.</p> <p>Finally, you can use some sort of non-binary tree. If you can really limit the values to a specific range (e.g. -100..100), you can use an ordinary vector or array with pointers to the subnodes, indexing directly with the element value, transposed as necessary. You then just walk the tree until either you find a null pointer, or you reach the end. The maximum depth of the tree will be 30, and in fact, every element will be 30 deep, but typically, you'll find that the element is unique before getting that deep. I suspect (but again, you'ld need to measure) that in your case, with many duplicates, this will in fact be significantly slower than the previous two suggestions. (And it would be a lot more work on your part, because I'm not aware of any existing implementations.)</p> <p>As for hashing, just about any form of linear congruent hashing should be sufficient: FNV, for example. Most of the documentation for such hashes concerns strings (arrays of <code>char</code>), but they tend to work just as well with any integral type. I've generally used something like:</p> <pre><code>template &lt;typename ForwardIterator&gt; size_t hash( ForwardIterator begin, ForwardIterator end ) { size_t results = 2166136261U for ( ForwardIterator current = begin; current != end; ++ current ) { results = 127 * results + static_cast&lt;size_t&gt;( *current ); } return results; } </code></pre> <p>My choice of <code>127</code> as a multiplier is largely based on speed in older systems: multiplying by 127 is a lot faster than most of the other values which give good results. (I have no idea whether this is still true. But multiplication is still a relatively slow operation on many machines, and the compiler will convert <code>127 * x</code> into something like <code>x &lt;&lt; 7 - x</code> if that is faster.) The distribution with the above algorithm is about as good as that for FNV, at least with the data sets I've tested.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload