StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POEfficient reordering of coordinate pairs (2-tuples) in a list of pairs in Python
text
Body
copied!<p>I am wanting to zip up a list of entities with a new entity to generate a list of coordinates (2-tuples), but I want to assure that for (i, j) that i < j is always true.</p> <p>However, I am not extremely pleased with my current solutions:</p> <pre><code>from itertools import repeat mems = range(1, 10, 2) mem = 8 def ij(i, j): if i < j: return (i, j) else: return (j, i) def zipij(m=mem, ms=mems, f=ij): return map(lambda i: f(i, m), ms) def zipij2(m=mem, ms=mems): return map(lambda i: tuple(sorted([i, m])), ms) def zipij3(m=mem, ms=mems): return [tuple(sorted([i, m])) for i in ms] def zipij4(m=mem, ms=mems): mems = zip(ms, repeat(m)) half1 = [(i, j) for i, j in mems if i < j] half2 = [(j, i) for i, j in mems[len(half1):]] return half1 + half2 def zipij5(m=mem, ms=mems): mems = zip(ms, repeat(m)) return [(i, j) for i, j in mems if i < j] + [(j, i) for i, j in mems if i > j] </code></pre> <p>Output for above:</p> <pre><code>>>> print zipij() # or zipij{2-5} [(1, 8), (3, 8), (5, 8), (7, 8), (8, 9)] </code></pre> <p>Instead of normally: </p> <pre><code>>>> print zip(mems, repeat(mem)) [(1, 8), (3, 8), (5, 8), (7, 8), (9, 8)] </code></pre> <p>Timings: <strong><em>snipped (no longer relevant, see much faster results in answers below)</em></strong></p> <p>For <code>len(mems) == 5</code>, there is no real issue with any solution, but for zipij5() for instance, the second list comprehension is needlessly going back over the first four values when <code>i > j</code> was already evaluated to be <code>True</code> for those in the first comprehension.</p> <p>For my purposes, I'm positive that <code>len(mems)</code> will never exceed ~10000, if that helps form any answers for what solution is best. To explain my use case a bit (I find it interesting), I will be storing a sparse, upper-triangular, similarity matrix of sorts, and so I need the coordinate <code>(i, j)</code> to not be duplicated at <code>(j, i)</code>. I say <em>of sorts</em> because I will be utilizing the new <code>Counter()</code> object in 2.7 to perform quasi matrix-matrix and matrix-vector addition. I then simply feed <code>counter_obj.update()</code> a list of 2-tuples and it increments those coordinates how many times they occur. SciPy sparse matrices ran about 50x slower, to my dismay, for my use cases... so I quickly ditched those.</p> <p>So anyway, I was surprised by my results... The first methods I came up with were <code>zipij4</code> and <code>zipij5</code>, and yet they are still the fastest, despite building a normal <code>zip()</code> and then generating a new zip after changing the values. I'm still rather new to Python, relatively speaking (Alex Martelli, can you hear me?), so here are my naive conclusions:</p> <ul> <li><code>tuple(sorted([i, j]))</code> is extremely expensive (Why is that?)</li> <li><code>map(lambda ...)</code> seems to always do worse than a list comp (I think I've read this and it makes sense)</li> <li>Somehow <code>zipij5()</code> isn't much slower despite going over the list twice to check for i-j inequality. (Why is this?)</li> </ul> <p>And lastly, I would like to know which is considered most efficient... or if there are any other fast and memory-inexpensive ways that I haven't yet thought of. Thank you.</p> <hr> <h2>Current Best Solutions</h2> <pre><code>## Most BRIEF, Quickest with UNSORTED input list: ## truppo's def zipij9(m=mem, ms=mems): return [(i, m) if i < m else (m, i) for i in ms] ## Quickest with pre-SORTED input list: ## Michal's def zipij10(m=mem, ms=mems): i = binsearch(m, ms) ## See Michal's answer for binsearch() return zip(ms[:i], repeat(m)) + zip(repeat(m), ms[i:]) </code></pre> <h2>Timings</h2> <pre><code># Michal's Presorted - 410µs per loop Unsorted - 2.09ms per loop ## Due solely to the expensive sorted() # truppo's Presorted - 880µs per loop Unsorted - 896µs per loop ## No sorted() needed </code></pre> <p>Timings were using <code>mems = range(1, 10000, 2)</code>, which is only ~5000 in length. <code>sorted()</code> will probably become worse at higher values, and with lists that are more shuffled. <code>random.shuffle()</code> was used for the "Unsorted" timings.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload