Note that there are some explanatory texts on larger screens.

plurals
  1. POJoining multiple iteratorars by a key
    primarykey
    data
    text
    <p>Given: n iterators, and a function to get a key for an item for each of them</p> <p>Assuming: </p> <ul> <li>The iterators provide the items sorted by the key</li> <li>The keys from any iterator are unique</li> </ul> <p>I want to iterate through them joined by the keys. Eg, given the following 2 lists:</p> <pre><code>[('a', {type:'x', mtime:Datetime()}), ('b', {type='y', mtime:Datetime()})] [('b', Datetime()), ('c', Datetime())] </code></pre> <p>Using the first item in each tuple as the key, I want to get:</p> <pre><code>(('a', {type:'x', mtime:Datetime()}), None) (('b', {type:'y', mtime:Datetime()}), ('b', Datetime()),) (None, ('c', Datetime()),) </code></pre> <p>So I hacked up this method:</p> <pre><code>def iter_join(*iterables_and_key_funcs): iterables_len = len(iterables_and_key_funcs) keys_funcs = tuple(key_func for iterable, key_func in iterables_and_key_funcs) iters = tuple(iterable.__iter__() for iterable, key_func in iterables_and_key_funcs) current_values = [None] * iterables_len current_keys= [None] * iterables_len iters_stoped = [False] * iterables_len def get_nexts(iters_needing_fetch): for i, fetch in enumerate(iters_needing_fetch): if fetch and not iters_stoped[i]: try: current_values[i] = iters[i].next() current_keys[i] = keys_funcs[i](current_values[i]) except StopIteration: iters_stoped[i] = True current_values[i] = None current_keys[i] = None get_nexts([True] * iterables_len) while not all(iters_stoped): min_key = min(key for key, iter_stoped in zip(current_keys, iters_stoped) if not iter_stoped) keys_equal_to_min = tuple(key == min_key for key in current_keys) yield tuple(value if key_eq_min else None for key_eq_min, value in zip(keys_equal_to_min, current_values)) get_nexts(keys_equal_to_min) </code></pre> <p>and test it:</p> <pre><code>key_is_value = lambda v: v a = ( 2, 3, 4, ) b = (1, ) c = ( 5,) d = (1, 3, 5,) l = list(iter_join( (a, key_is_value), (b, key_is_value), (c, key_is_value), (d, key_is_value), )) import pprint; pprint.pprint(l) </code></pre> <p>which outputs:</p> <pre><code>[(None, 1, None, 1), (2, None, None, None), (3, None, None, 3), (4, None, None, None), (None, None, 5, 5)] </code></pre> <p>Is there an existing method to do this? I checkout itertools, but could not find anything.</p> <p>Are there any ways to improve my method? Make it simpler, faster, etc..</p> <h2>Update: Solution used</h2> <p>I decided to simplify the contract for this function by requiring the iterators to yield tuple(key, value) or tuple(key, *values). Using agf's answer as a starting point, I came up with this :</p> <pre><code>def join_items(*iterables): iters = tuple(iter(iterable) for iterable in iterables) current_items = [next(itr, None) for itr in iters] while True: try: key = min(item[0] for item in current_items if item != None) except ValueError: break yield tuple(item if item != None and item[0]==key else None for item in current_items) for i, (item, itr) in enumerate(zip(current_items, iters)): if item != None and item[0] == key: current_items[i] = next(itr, None) a = ( (2,), (3,), (4,), ) b = ((1,), ) c = ( (5,),) d = ((1,), (3,), (5,),) e = ( ) import pprint; pprint.pprint(list(join_items(a, b, c, d, e))) [(None, (1,), None, (1,), None), ((2,), None, None, None, None), ((3,), None, None, (3,), None), ((4,), None, None, None, None), (None, None, (5,), (5,), None)] </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload