Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You needn't copy the data itself or write anything to disk; just count the members of some partition of the data to identify openings. The tradeoff is between number of passes and memory (more memory allows for more counts, smaller partitions).</p> <p>Here's a solution in 8 passes. We'll partition the data using 4 bits at a time. 2^4 = 16 possible values, so we'll need 64 bytes to store counts for each of the 16 partitions. So each 4 bit nibble value has an associated count. </p> <p>Make a pass through the data, incrementing the associated count matching the nibble in the first four bits of the number. If a partition is full, its count will be 2^28. Pick one of the nibbles that isn't full --- this will be the first nibble of your result.</p> <p>Zero your counts and make another pass, ignoring numbers that don't match the first nibble and incrementing the count associated with the second nibble in the number. A full second nibble will have a value of 2^24. Pick one that isn't full.</p> <p>Proceed in this manner until you have all 8 nibbles, and there's your answer in O(N).</p> <p>If you only check one bit at a time, <strike>this would be a binary search</strike> requiring 32 passes. (EDIT: Not really a binary search, since you still have to read the values that you're skipping. That's why it's O(N). See edit below.) If you have a KB of memory for counts, you can do it in 4 passes; with 256 KB you can do it in 2 --- actually 128 KB since you could then use short ints for your counts. Here we're constrained to a few hundred bytes --- maybe 6 bits/6 passes/256 bytes?</p> <p>EDIT: Li-aung Yip's solution scales better, and clearly it can be modified to partition by more than one bit in a pass. If the writing is slower than reading, then maybe the best solution would be a hybrid between this read-only answer and Li-aung Yip's disk based one. </p> <p>Do a pass counting nibbles as above, then as you count the second set of nibbles, write only the numbers (or possibly just the last 28 bits of them) that match the first nibble, into 16 files according to the second nibble. </p> <p>Pick your second nibble and read it to get counts for the third nibble, writing only the numbers matching the second nibble, etc. If the file is close to capacity, if the numbers are fairly uniformly distributed, or if you pick the least-full nibbles each time, you'll have to write no more than about 6.66% (1/16+1/16^2...) of the file size.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload