StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Combinatorial problems are notorious for being easy to state but possibly difficult to solve. For this one, I wouldn't use <code>itertools</code> at all, but instead write a custom generator. For example,</p> <pre><code>def combs(elt2color, combination_size=4, max_colors=3): def inner(needed, index): if needed == 0: yield result return if n - index < needed: # not enough elements remain to reach # combination_size return # first all results that don't contain elts[index] for _ in inner(needed, index + 1): yield result # and then all results that do contain elts[index] needed -= 1 elt = elts[index] color = elt2color[elt] color_added = color not in colors_seen colors_seen.add(color) if len(colors_seen) <= max_colors: result[needed] = elt for _ in inner(needed, index + 1): yield result if color_added: colors_seen.remove(color) elts = tuple(elt2color) n = len(elts) colors_seen = set() result = [None] * combination_size for _ in inner(combination_size, 0): yield tuple(result) </code></pre> <p>Then:</p> <pre><code>elt2color = dict([('A', 'red'), ('B', 'red'), ('C', 'blue'), ('D', 'blue'), ('E', 'green'), ('F', 'green'), ('G', 'green'), ('H', 'yellow'), ('I', 'white'), ('J', 'white'), ('K', 'black')]) for c in combs(elt2color): for element in c: print("%s-%s" % (element, elements[element])) print "\n" </code></pre> <p>produces the same 188 combinations as your post-processing code, but internally abandons a partial combination as soon as it would span more than <code>max_colors</code> colors. There's no way to change what <code>itertools</code> functions do internally, so when you <em>want</em> control over that, you need to roll your own.</p> <p><strong>Using itertools</strong></p> <p>Here's another approach, generating first all solutions with exactly 1 color, then exactly 2 colors, and so on. <code>itertools</code> can be used directly for much of this, but at the lowest level still needs a custom generator. I find this harder to understand than a fully custom generator, but it may be clearer to you:</p> <pre><code>def combs2(elt2color, combination_size=4, max_colors=3): from collections import defaultdict from itertools import combinations color2elts = defaultdict(list) for elt, color in elt2color.items(): color2elts[color].append(elt) def at_least_one_from_each(iterables, n): if n < len(iterables): return # impossible if not n or not iterables: if not n and not iterables: yield () return # Must have n - num_from_first >= len(iterables) - 1, # so num_from_first <= n - len(iterables) + 1 for num_from_first in range(1, min(len(iterables[0]) + 1, n - len(iterables) + 2)): for from_first in combinations(iterables[0], num_from_first): for rest in at_least_one_from_each(iterables[1:], n - num_from_first): yield from_first + rest for numcolors in range(1, max_colors + 1): for colors in combinations(color2elts, numcolors): # Now this gets tricky. We need to pick # combination_size elements across all the colors, but # must pick at least one from each color. for elements in at_least_one_from_each( [color2elts[color] for color in colors], combination_size): yield elements </code></pre> <p>I haven't timed these, because I don't care ;-) The fully custom generator's single <code>result</code> list is reused for building each output, which slashes the rate of dynamic memory turnover. The second way creates a lot of memory churn by pasting together multiple levels of <code>from_first</code> and <code>rest</code> tuples - and that's mostly unavoidable because it uses <code>itertools</code> to generate the <code>from_first</code> tuples at each level.</p> <p>Internally, <code>itertools</code> functions almost always work in a way more similar to the first code sample, and for the same reasons, reusing an internal buffer as much as possible.</p> <p><strong>AND ONE MORE</strong></p> <p>This is more to illustrate some subtleties. I thought about what I'd do if I were to implement this functionality in C as an <code>itertools</code> function. All the <code>itertools</code> functions were first prototyped in Python, but in a semi-low-level way, reduced to working with vectors of little integers (no "inner loop" usage of sets, dicts, sequence slicing, or pasting together partial result sequences - sticking as far as possible to <code>O(1)</code> worst-case time operations on dirt simple native C types after initialization).</p> <p>At a higher level, an <code>itertools</code> function for this would accept any iterable as its primary argument, and almost certainly guarantee to return combinations from that in lexicographic index order. So here's code that does all that. In addition to the <code>iterable</code> argument, it also requires an <code>elt2ec</code> mapping, which maps each element from the iterable to its equivalence class (for you, those are strings naming colors, but any objects usable as dict keys <em>could</em> be used as equivalence classes):</p> <pre><code>def combs3(iterable, elt2ec, k, maxec): # Generate all k-combinations from `iterable` spanning no # more than `maxec` equivalence classes. elts = tuple(iterable) n = len(elts) ec = [None] * n # ec[i] is equiv class ordinal of elts[i] ec2j = {} # map equiv class to its ordinal for i, elt in enumerate(elts): thisec = elt2ec[elt] j = ec2j.get(thisec) if j is None: j = len(ec2j) ec2j[thisec] = j ec[i] = j countec = [0] * len(ec2j) del ec2j def inner(i, j, totalec): if i == k: yield result return for j in range(j, jbound[i]): thisec = ec[j] thiscount = countec[thisec] newtotalec = totalec + (thiscount == 0) if newtotalec <= maxec: countec[thisec] = thiscount + 1 result[i] = j yield from inner(i+1, j+1, newtotalec) countec[thisec] = thiscount jbound = list(range(n-k+1, n+1)) result = [None] * k for _ in inner(0, 0, 0): yield (elts[i] for i in result) </code></pre> <p>(Note that this is Python 3 code.) As advertised, nothing in <code>inner()</code> is fancier than indexing a vector with a little integer. The only thing remaining to make it directly translatable to C is removing the recursive generation. That's tedious, and since it wouldn't illustrate anything particularly interesting here I'm going to ignore that.</p> <p>Anyway, the interesting thing is timing it. As noted in a comment, timing results are strongly influenced by the test cases you use. <code>combs3()</code> here is sometimes fastest, but not often! It's almost always faster than my original <code>combs()</code>, but usually slower than my <code>combs2()</code> or @GarethRees's lovely <code>constrained_combinations()</code>.</p> <p>So how can that be when <code>combs3()</code> has been optimized "almost all the way down to mindless ;-) C-level operations"? Easy! It's still written in Python. <code>combs2()</code> and <code>constrained_combinations()</code> use the C-coded <code>itertools.combinations()</code> to do much of their work, and that makes a world of difference. <code>combs3()</code> would run circles around them <em>if</em> it were coded in C.</p> <p>Of course any of these can run unboundedly faster than the <code>allowed_combinations()</code> in the original post - but that one can be fastest too (for example, pick just about any inputs where <code>max_colors</code> is so large that no combinations are excluded - then <code>allowed_combinations()</code> wastes little effort, while all these others add extra substantial extra overheads to "optimize" pruning that never occurs).</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload