Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>While the <em>only way</em> to objectively answer this question is with performance tests, I will argue for using a Hashtable Map. (Caching and memory access can be so full of surprises; I do not have the expertise to speculate on which one will be faster, and when. Also consider that localized performance differences may be marginalized by other code.)</p> <p>My first reason for "initially choosing" a hash is based off of the observation that there are 100M distinct keys but <em>only</em> 0.1M active records. This means that <strong>if using an array, index utilization will only be 0.1%</strong> - this is a <em>very sparse</em> array.</p> <p>If the data is stored <em>as values in</em> the array then it needs to be relatively small or the array size will balloon. If the data is <em>not</em> stored in the array (e.g. array is of pointers) then the argument for locality of data in the array is partially mitigated. Either way, the simple array approach requires <em>lots of unused space</em>.</p> <p>Since all the keys are already integers, <strong>the distribution (hash) function and can be efficiently implemented</strong> - there is no need to create a hash of a complex type/sequence so the "cost" of this function should approach zero.</p> <p>So, my simple proposed hash:</p> <ul> <li>Use <em>linear probing</em> backed by contiguous memory. It is simple, has good locality (especially during the probe), and avoids needing to do any form of dynamic allocation.</li> <li>Pick a suitable initial bucket size; say, 2x (or 0.2M buckets, primed). Don't even give the hash a chance of resize. Note that this suggested bucket array size is only <em>0.2%</em> the size of the simple array approach and could be reduced further as the size vs. collision rate can be tuned.</li> <li>Create a good distribution function for the hash. It can also exploit knowledge of the ID range.</li> </ul> <p>While I've presented specialized hashtable rules "optimized" for the given case, I would start with a normal Map implementation (be it a hashtable or tree) and test it .. if a standard implementation works suitably well, why not use it?</p> <p>Now, test different candidates under expected and extreme loads - and pick the winner.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload