Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>What a great question! To be consistent with terminology, I will refer to the lines in your matrix as "input <a href="http://en.wikipedia.org/wiki/Multiset" rel="nofollow">bags</a>" and the items in the input bags as "objects". A bag (or multiset) is a container that allow members to appear more than once but whose members do not have an inherent order (unlike lists).</p> <p>We are looking for a function with the following signature:</p> <pre><code>set of valid combinations = generate_combinations(list of input bags, number of objects in valid combination) </code></pre> <p>Since it is possible that the set of valid combinations exceeds the memory available to Python, <code>generate_combinations</code> should return a generator rather than an explicit list.</p> <p>A valid result must satisfy the following requirements:</p> <ol> <li>At least 1 object from each input bag</li> <li>Will have n objects total</li> </ol> <p>I am assuming the following:</p> <ol> <li>The order of the objects in a result does not matter</li> <li>An input bag can contain duplicate objects</li> <li>Two input bags can contain duplicate objects (in the degenerate case, two input bags can be identical)</li> <li>A valid result pulls objects without replacement</li> </ol> <p>Requirement #1 and Assumption #4 imply <code>number of input bags &lt;= n &lt;= total number of objects in all bags</code></p> <p><strong>Data Structures</strong></p> <ul> <li>Since input bags are allowed to contain duplicate values (per Assumption #2), we cannot use set/frozenset to store these. Python lists are suitable for this task. An alternative container could be <a href="http://docs.python.org/2/library/collections.html#collections.Counter" rel="nofollow">collections.Counter</a>, which has constant-time membership test and better spatial efficiency for lists with many duplicates.</li> <li>Each valid combination is also a bag</li> <li>Order does not matter for the list of input bags, so this could be generalized as a bag of input bags. For sanity, I'll leave it as is.</li> </ul> <p>I will use Python's built-in <code>itertools</code> module to generate combinations, which was introduced in v2.6. If you are running an older version of Python, use a recipe. For these code-examples, I have implicitly converted generators into lists for clarity.</p> <pre><code>&gt;&gt;&gt; import itertools, collections &gt;&gt;&gt; input_bags = [Bag([1,2,2,3,5]), Bag([1,4,5,9]), Bag([12])] &gt;&gt;&gt; output_bag_size = 5 &gt;&gt;&gt; combos = generate_combinations(input_bags, output_bag_size) &gt;&gt;&gt; combos.next() #an arbitrary example Bag([1,1,2,4,12]) </code></pre> <p>Realize that the problem as stated above can be reduced to one that is immediately solvable by Python's built-in itertools module, which generates combinations of sequences. The only modification we need to do is eliminate Requirement #1. To solve the reduced problems, combine the members of the bags into a single bag.</p> <pre><code>&gt;&gt;&gt; all_objects = itertools.chain.from_iterable(input_bags) &gt;&gt;&gt; all_objects generator that returns [1, 2, 2, 3, 5, 1, 4, 5, 9, 12] &gt;&gt;&gt; combos = itertools.combinations(all_objects, output_bag_size) &gt;&gt;&gt; combos generator that returns [(1,2,2,3,5), (1,2,2,3,1), (1,2,2,3,4), ...] </code></pre> <p>To reinstate requirement #1, each valid combination (output bag) needs to contain 1 element from each input bag plus additional elements from any of the bags until the output bag size reaches the user-specified value. If the overlap between [1 element from each input bag] and [additional elements from any of the bags] is ignored, the solution is just the cartesian product of the possible combinations of the first and the possible combinations of the second.</p> <p>To handle the overlap, remove the elements from [1 element from each input bag] from the [additional elements from any of the bags], and for-loop away.</p> <pre><code>&gt;&gt;&gt; combos_with_one_element_from_each_bag = itertools.product(*input_bags) &gt;&gt;&gt; for base_combo in combos_with_one_element_from_each_bag: &gt;&gt;&gt; all_objects = list(itertools.chain.from_iterable(input_bags)) &gt;&gt;&gt; for seen in base_combo: &gt;&gt;&gt; all_objects.remove(seen) &gt;&gt;&gt; all_objects_minus_base_combo = all_objects &gt;&gt;&gt; for remaining_combo in itertools.combinations(all_objects_minus_base_combo, output_bag_size-len(base_combo)): &gt;&gt;&gt; yield itertools.chain(base_combo, remaining_combo) </code></pre> <p>The remove operation is supported on lists but isn't supported on generators. To avoid storing all_objects in memory as a list, create a function that skips over the elements in base_combo.</p> <pre><code>&gt;&gt;&gt; def remove_elements(iterable, elements_to_remove): &gt;&gt;&gt; elements_to_remove_copy = Bag(elements_to_remove) #create a soft copy &gt;&gt;&gt; for item in iterable: &gt;&gt;&gt; if item not in elements_to_remove_copy: &gt;&gt;&gt; yield item &gt;&gt;&gt; else: &gt;&gt;&gt; elements_to_remove_copy.remove(item) </code></pre> <p>An implementation of the Bag class might look like this:</p> <pre><code>&gt;&gt;&gt; class Bag(collections.Counter): &gt;&gt;&gt; def __iter__(self): &gt;&gt;&gt; return self.elements() &gt;&gt;&gt; def remove(self, item): &gt;&gt;&gt; self[item] -= 1 &gt;&gt;&gt; if self[item] &lt;= 0: &gt;&gt;&gt; del self[item] </code></pre> <p><strong>Complete code</strong></p> <pre><code>import itertools, collections class Bag(collections.Counter): def __iter__(self): return self.elements() def remove(self, item): self[item] -= 1 if self[item] &lt;= 0: del self[item] def __repr__(self): return 'Bag(%s)' % sorted(self.elements()) def remove_elements(iterable, elements_to_remove): elements_to_remove_copy = Bag(elements_to_remove) #create a soft copy for item in iterable: if item not in elements_to_remove_copy: yield item else: elements_to_remove_copy.remove(item) def generate_combinations(input_bags, output_bag_size): combos_with_one_element_from_each_bag = itertools.product(*input_bags) for base_combo in combos_with_one_element_from_each_bag: all_objects_minus_base_combo = remove_elements(itertools.chain.from_iterable(input_bags), base_combo) for remaining_combo in itertools.combinations(all_objects_minus_base_combo, output_bag_size-len(base_combo)): yield Bag(itertools.chain(base_combo, remaining_combo)) input_bags = [Bag([1,2,2,3,5]), Bag([1,4,5,9]), Bag([12])] output_bag_size = 5 combos = generate_combinations(input_bags, output_bag_size) list(combos) </code></pre> <p>Finish this off by adding in error-checking code (such as output_bag_size >= len(input_bags).</p> <p>The benefits of this approach are: 1. Less code to maintain (namely itertools) 2. No recursion. Python performance degrades with call stack height, so minimizing recursion is beneficial. 3. Minimum memory consumption. This algorithm requires constant-space memory beyond what's consumed by the list of input bags.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload