Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>The extra information can indeed be used to minimize the total number of comparisons. Calls to the super_comparison function can be used to make deductions equivalent to a great number of calls to a regular comparsion function. For example, <code>a much-less-than b</code> and <code>c little-less-than b</code> implies <code>a &lt; c &lt; b</code>.</p> <p>The deductions cans be organized into bins or partitions which can each be sorted separately. Effectively, this is equivalent to QuickSort with n-way partition. Here's an implementation in Python:</p> <pre><code>from collections import defaultdict from random import choice def quicksort(seq, compare): 'Stable in-place sort using a 3-or-more-way comparison function' # Make an n-way partition on a random pivot value segments = defaultdict(list) pivot = choice(seq) for x in seq: ranking = 0 if x is pivot else compare(x, pivot) segments[ranking].append(x) seq.clear() # Recursively sort each segment and store it in the sequence for ranking, segment in sorted(segments.items()): if ranking and len(segment) &gt; 1: quicksort(segment, compare) seq += segment if __name__ == '__main__': from random import randrange from math import log10 def super_compare(a, b): 'Compare with extra logarithmic near/far information' c = -1 if a &lt; b else 1 if a &gt; b else 0 return c * (int(log10(max(abs(a - b), 1.0))) + 1) n = 10000 data = [randrange(4*n) for i in range(n)] goal = sorted(data) quicksort(data, super_compare) print(data == goal) </code></pre> <p>By instrumenting this code with the <em>trace</em> module, it is possible to measure the performance gain. In the above code, a regular three-way compare uses 133,000 comparisons while a super comparison function reduces the number of calls to 85,000.</p> <p>The code also makes it easy to experiment with a variety comparison functions. This will show that naïve n-way comparison functions do very little to help the sort. For example, if the comparison function returns +/-2 for differences greater than four and +/-1 for differences four or less, there is only a modest 5% reduction in the number of comparisons. The root cause is that the course grained partitions used in the beginning only have a handful of "near matches" and everything else falls in "far matches".</p> <p>An improvement to the super comparison is to covers logarithmic ranges (i.e. +/-1 if within ten, +/-2 if within a hundred, +/- if within a thousand.</p> <p>An ideal comparison function would be adaptive. For any given sequence size, the comparison function should strive to subdivide the sequence into partitions of roughly equal size. Information theory tells us that this will maximize the number of bits of information per comparison.</p> <p>The adaptive approach makes good intuitive sense as well. People should first be partitioned into <em>love</em> vs <em>like</em> before making more refined distinctions such as love-a-lot vs love-a-little. Further partitioning passes should each make finer and finer distinctions.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload