Note that there are some explanatory texts on larger screens.

plurals
  1. POFind an integer not among four billion given ones
    primarykey
    data
    text
    <p>It is an interview question:</p> <blockquote> <p>Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1&nbsp;GB memory. Follow up with what you would do if you have only 10&nbsp;MB of memory.</p> </blockquote> <p>My analysis:</p> <p>The size of the file is 4×10<sup>9</sup>×4 bytes = 16&nbsp;GB.</p> <p>We can do external sorting, thus we get to know the range of the integers. My question is what is the best way to detect the missing integer in the sorted big integer sets?</p> <p>My understanding(after reading all answers):</p> <p>Assuming we are talking about 32-bit integers. There are 2^32 = 4*10<sup>9</sup> distinct integers.</p> <p>Case 1: we have 1&nbsp;GB = 1 * 10<sup>9</sup> * 8 bits = 8 billion bits memory. Solution: if we use one bit representing one distinct integer, it is enough. we don't need sort. Implementation: </p> <pre><code>int radix = 8; byte[] bitfield = new byte[0xffffffff/radix]; void F() throws FileNotFoundException{ Scanner in = new Scanner(new FileReader("a.txt")); while(in.hasNextInt()){ int n = in.nextInt(); bitfield[n/radix] |= (1 &lt;&lt; (n%radix)); } for(int i = 0; i&lt; bitfield.lenght; i++){ for(int j =0; j&lt;radix; j++){ if( (bitfield[i] &amp; (1&lt;&lt;j)) == 0) System.out.print(i*radix+j); } } } </code></pre> <p>Case 2: 10&nbsp;MB memory = 10 * 10<sup>6</sup> * 8 bits = 80 million bits</p> <pre><code>Solution: For all possible 16-bit prefixes, there are 2^16 number of integers = 65536, we need 2^16 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket. step1: Build the counter of each bucket through the first pass through the file. step2: Scan the buckets, find the first one who has less than 65536 hit. step3: Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file step4: Scan the buckets built in step3, find the first bucket which doesnt have a hit. The code is very similar to above one. </code></pre> <p>Conclusion: We decrease memory through increasing file pass.</p> <hr> <p><em>A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file -- at least that's not how most people interpret it. Many comments in the comment thread <strong>are</strong> about that variation of the task, though. Unfortunately the comment that <strong>introduced</strong> it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing. Sorry.</em></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload