Note that there are some explanatory texts on larger screens.

plurals
  1. POPython: fast dictionary of big int keys
    primarykey
    data
    text
    <p>I have got a list of >10.000 int items. The values of the items can be very high, up to 10^27. Now I want to create all pairs of the items and calculate their sum. Then I want to look for different pairs with the same sum.</p> <p>For example:</p> <pre><code>l[0] = 4 l[1] = 3 l[2] = 6 l[3] = 1 ... pairs[10] = [(0,2)] # 10 is the sum of the values of l[0] and l[2] pairs[7] = [(0,1), (2,3)] # 7 is the sum of the values of l[0] and l[1] or l[2] and l[3] pairs[5] = [(0,3)] pairs[9] = [(1,2)] ... </code></pre> <p>The contents of <code>pairs[7]</code> is what I am looking for. It gives me two pairs with the same value sum.</p> <p>I have implemented it as follows - and I wonder if it can be done faster. Currently, for 10.000 items it takes >6 hours on a fast machine. (As I said, the values of <code>l</code> and so the keys of <code>pairs</code> are ints up to 10^27.)</p> <pre><code>l = [4,3,6,1] pairs = {} for i in range( len( l ) ): for j in range(i+1, len( l ) ): s = l[i] + l[j] if not s in pairs: pairs[s] = [] pairs[s].append((i,j)) # pairs = {9: [(1, 2)], 10: [(0, 2)], 4: [(1, 3)], 5: [(0, 3)], 7: [(0, 1), (2, 3)]} </code></pre> <hr> <p><strong>Edit:</strong> I want to add some background, as asked by Simon Stelling.</p> <p>The goal is to find Formal Analogies like</p> <pre><code>lays : laid :: says : said </code></pre> <p>within a list of words like</p> <pre><code>[ lays, lay, laid, says, said, foo, bar ... ] </code></pre> <p>I already have a function <code>analogy(a,b,c,d)</code> giving <code>True</code> if <code>a : b :: c : d</code>. However, I would need to check all possible quadruples created from the list, which would be a complexity of around O((n^4)/2).</p> <p>As a pre-filter, I want to use the char-count property. It says that every char has the same count in (a,d) and in (b,c). For instance, in "layssaid" we have got 2 a's, and so we do in "laidsays"</p> <p>So the idea until now was</p> <ul> <li>for every word to create a "char count vector" and represent it as an integer (the items in the list <code>l</code>)</li> <li>create all pairings in <code>pairs</code> and see if there are "pair clusters", i.e. more than one pair for a particular char count vector sum.</li> </ul> <p>And it works, it's just slow. The complexity is down to around O((n^2)/2) but this is still a lot, and especially the dictionary lookup and insert is done that often.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload