Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizing the equality and inequality operators
    primarykey
    data
    text
    <p>I have some structures that are very expensive to compare. (They are actually trees with distinct branches.) Computing hash values for them is also expensive.</p> <p>I want to create a decorator for the <strong>eq</strong> operator that will cache some results to speed things up. This is somewhat similar to memoization.</p> <p>In particular, I want something like this to happen. Suppose we have 3 objects: A, B and C. We compare A with B. The <strong>eq</strong> operator gets called, returns True, the result gets stored. We compare B with C. The <strong>eq</strong> operator gets called as before. Now we compare A with C. Now the algorithm should detect that A is equal to B and B is equal to C, so it should return that A is equal to C without invoking the costly <strong>eq</strong> operator.</p> <p>I wanted to use the union-find algorithm, but it only allows for caching <strong>equalities</strong>, and doesn't allow to cache <strong>inequalities</strong>.</p> <p>Suppose that we have 2 objects equal to each other: A and B. Suppose also that we have another pair of equal objects: C and D. The union-find algorithm will correctly group them into two categories (A, B) and (C, D). Now suppose A <strong>is not</strong> equal to C. My algorithm should cache it somehow and prevent the <strong>eq</strong> operator further from running on pairs (A, C), (B, C), (A, D), (B, D), since we can deduce all these pairs are unequal. Union-find does not allow that. It only saves the positive equalities, failing miserably when we have to compare many unequal objects.</p> <p>My current solution is something like that:</p> <pre><code>def optimize(original_eq): def optimized_eq(first, second): if first is second: return True if hash(first) != hash(second): return False if cache.find(first) == cache.find(second): return True result = original_eq(first, second) if result: cache.union(first, second) else: pass # no idea how to save the negative result return result return optimized_eq </code></pre> <p>This solution would be OK if the hash function was easy to compute, but it isn't. We would be invoking cache.find on objects that are highly likely to be equal, so we would rarely need to call the original equality operator. But, as I said, the hash function is very slow on my trees (it basically needs to traverse all the tree, comparing branches on each node to remove duplicates), so I want to remove it. I want to cache the negative results instead.</p> <p>Does anyone know a good solution to this problem? I need to cache not only the positive comparison results, but also negative ones.</p> <p>UPDATE:</p> <p>My current solutions that works for me follows:</p> <pre><code>def memoize_hash_and_eq(cls): "This decorator should be applied to the class." def union(key1, key2): nonlocal union_find if key1 is not key2: key1_leader = union_find(key1) key2_leader = union_find(key2) key1_leader._memoize_hash_and_eq__leader = key2_leader try: key2_leader._memoize_hash_and_eq__splits = key1_leader._memoize_hash_and_eq__splits del key1_leader._memoize_hash_and_eq__splits except AttributeError: pass def union_find(key): leader = key while True: try: leader = leader._memoize_hash_and_eq__leader except AttributeError: break if leader is not key: key._memoize_hash_and_eq__leader = leader try: leader.__splits = key._memoize_hash_and_eq__splits del key._memoize_hash_and_eq__splits except AttributeError: pass return leader def split(key1, key2): nonlocal union_find key1_leader = union_find(key1) key2_leader = union_find(key2) try: key1_leader._memoize_hash_and_eq__splits.add(key2_leader) except AttributeError: try: key2_leader._memoize_hash_and_eq__splits.add(key1_leader) except AttributeError: try: key1_leader._memoize_hash_and_eq__splits = set() key1_leader._memoize_hash_and_eq__splits.add(key2_leader) except (AttributeError, TypeError): pass def split_find(key1, key2): nonlocal union_find key1_leader = union_find(key1) key2_leader = union_find(key2) try: split_leaders = key2_leader._memoize_hash_and_eq__splits for k in [_k for _k in split_leaders]: split_leaders.add(union_find(k)) if key1_leader in split_leaders: return True except (AttributeError, TypeError): pass try: split_leaders = key1_leader._memoize_hash_and_eq__splits for k in [_k for _k in split_leaders]: split_leaders.add(union_find(k)) if key2_leader in split_leaders: return True except (AttributeError, TypeError): pass return False def memoized_hash(self): return original_hash(union_find(self)) original_hash = cls.__hash__ cls.__hash__ = memoized_hash def memoized_equivalence(self, other): if self is other: return True if union_find(self) is union_find(other): return True if split_find(self, other): return False result = original_equivalence(self, other) if result is NotImplemented: return result elif result: union(self, other) else: split(self, other) return result original_equivalence = cls.__eq__ cls.__eq__ = memoized_equivalence return cls </code></pre> <p>This speeds up both eq and hash.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload