Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There is a slight flaw in <a href="https://stackoverflow.com/questions/464342/combining-two-sorted-lists-in-python#464454">ghoseb's</a> solution, making it O(n**2), rather than O(n).<br> The problem is that this is performing:</p> <pre><code>item = l1.pop(0) </code></pre> <p>With linked lists or deques this would be an O(1) operation, so wouldn't affect complexity, but since python lists are implemented as vectors, this copies the rest of the elements of l1 one space left, an O(n) operation. Since this is done each pass through the list, it turns an O(n) algorithm into an O(n**2) one. This can be corrected by using a method that doesn't alter the source lists, but just keeps track of the current position.</p> <p>I've tried out benchmarking a corrected algorithm vs a simple sorted(l1+l2) as suggested by <a href="https://stackoverflow.com/questions/464342/combining-two-sorted-lists-in-python#464538">dbr</a></p> <pre><code>def merge(l1,l2): if not l1: return list(l2) if not l2: return list(l1) # l2 will contain last element. if l1[-1] &gt; l2[-1]: l1,l2 = l2,l1 it = iter(l2) y = it.next() result = [] for x in l1: while y &lt; x: result.append(y) y = it.next() result.append(x) result.append(y) result.extend(it) return result </code></pre> <p>I've tested these with lists generated with</p> <pre><code>l1 = sorted([random.random() for i in range(NITEMS)]) l2 = sorted([random.random() for i in range(NITEMS)]) </code></pre> <p>For various sizes of list, I get the following timings (repeating 100 times):</p> <pre><code># items: 1000 10000 100000 1000000 merge : 0.079 0.798 9.763 109.044 sort : 0.020 0.217 5.948 106.882 </code></pre> <p>So in fact, it looks like dbr is right, just using sorted() is preferable unless you're expecting very large lists, though it does have worse algorithmic complexity. The break even point being at around a million items in each source list (2 million total).</p> <p>One advantage of the merge approach though is that it is trivial to rewrite as a generator, which will use substantially less memory (no need for an intermediate list).</p> <p><strong>[Edit]</strong> I've retried this with a situation closer to the question - using a list of objects containing a field "<code>date</code>" which is a datetime object. The above algorithm was changed to compare against <code>.date</code> instead, and the sort method was changed to:</p> <pre><code>return sorted(l1 + l2, key=operator.attrgetter('date')) </code></pre> <p>This does change things a bit. The comparison being more expensive means that the number we perform becomes more important, relative to the constant-time speed of the implementation. This means merge makes up lost ground, surpassing the sort() method at 100,000 items instead. Comparing based on an even more complex object (large strings or lists for instance) would likely shift this balance even more.</p> <pre><code># items: 1000 10000 100000 1000000[1] merge : 0.161 2.034 23.370 253.68 sort : 0.111 1.523 25.223 313.20 </code></pre> <p>[1]: Note: I actually only did 10 repeats for 1,000,000 items and scaled up accordingly as it was pretty slow.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload