Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are only 4 billion possible <code>int</code> values so the most efficient way of doing this, is to count the number of occurrences of any value. You can use a memory MappedByteBuffer so you don't have to have 16 GB of memory. Once you have counted all the occurrences the counts will naturally be in order, so no further sorting is required. The time complexity is O(n) instead of O(n * log n) line merge sort or quick sort.</p> <hr> <pre><code>import sun.nio.ch.DirectBuffer; import java.io.File; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; public class Sort10Billion { public static void main(String... args) throws IOException { Runtime runtime = Runtime.getRuntime(); long used1 = runtime.totalMemory() - runtime.freeMemory(); MassiveCounterStore mcs = new MassiveCounterStore(); long start = System.nanoTime(); long count = 10 * 1000 * 1000 * 1000L; for (long i = count; i &gt; 0; i--) mcs.incrementIndex((int) (i / 1019)); mcs.iterator(new NumberCountFunction() { @Override public void counted(int n, long count) { // System.out.println(n + ": " + count); } }); long time = System.nanoTime() - start; long used2 = runtime.totalMemory() - runtime.freeMemory(); System.out.printf("Took %.1f seconds to sort %,d numbers, using %.3f MB%n", time / 1e9, count, (used2-used1)/1e6); mcs.close(); } } interface NumberCountFunction { public void counted(int n, long count); } class MassiveCounterStore { public static final int PARTITION_BITS = 26; static final int PARTITIONS = (1 &lt;&lt; (34 - PARTITION_BITS)); // 32-bit * 4 bytes. final MappedByteBuffer[] buffers = new MappedByteBuffer[PARTITIONS]; final FileChannel channel; int smallest = PARTITIONS; int largest = 0; public MassiveCounterStore() throws IOException { File tmpStore = File.createTempFile("counter", "dat"); tmpStore.deleteOnExit(); channel = new RandomAccessFile(tmpStore, "rw").getChannel(); for (int i = 0; i &lt; PARTITIONS; i++) buffers[i] = channel.map(FileChannel.MapMode.READ_WRITE, (long) i &lt;&lt; PARTITION_BITS, 1 &lt;&lt; PARTITION_BITS); } public void incrementIndex(int n) { long l = (n + Integer.MIN_VALUE) &amp; 0xFFFFFFFFL; int partition = (int) (l &gt;&gt; (PARTITION_BITS - 2)); // 4 bytes each. int index = (int) ((l &lt;&lt; 2) &amp; ((1 &lt;&lt; PARTITION_BITS) - 1)); MappedByteBuffer buffer = buffers[partition]; int count = buffer.getInt(index); buffer.putInt(index, count + 1); if (smallest &gt; partition) smallest = partition; if (largest &lt; partition) largest = partition; } public void iterator(NumberCountFunction nfc) { int n = (smallest &lt;&lt; (PARTITION_BITS -2)) + Integer.MIN_VALUE; for (int p = smallest; p &lt;= largest; p++) { MappedByteBuffer buffer = buffers[p]; for (int i = 0; i &lt; 1 &lt;&lt; PARTITION_BITS; i += 4) { int count = buffer.getInt(i); if (count != 0) nfc.counted(n, count &amp; 0xFFFFFFFFL); n++; } } assert n == Integer.MIN_VALUE; } public void close() { try { channel.close(); } catch (IOException ignored) { } for (MappedByteBuffer buffer : buffers) { ((DirectBuffer) buffer).cleaner().clean(); } } } </code></pre> <p>prints when run with -XX:-UseTLAB (which gives you more accurate memory usage)</p> <pre><code>Took 150.7 seconds to sort 10,000,000,000 numbers, using 0.202 MB </code></pre> <p>I think using 202 KB is pretty good. ;)</p> <p>Note: your performance is heavily dependant on the distribution of values as this impacts the efficiency of the cache.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload