Note that there are some explanatory texts on larger screens.

plurals
  1. POBest way to sort 1M records in Python
    text
    copied!<p>I have a service that runs that takes a list of about 1,000,000 dictionaries and does the following</p> <pre><code>myHashTable = {} myLists = { 'hits':{}, 'misses':{}, 'total':{} } sorted = { 'hits':[], 'misses':[], 'total':[] } for item in myList: id = item.pop('id') myHashTable[id] = item for k, v in item.iteritems(): myLists[k][id] = v </code></pre> <p>So, if I had the following list of dictionaries:</p> <pre><code>[ {'id':'id1', 'hits':200, 'misses':300, 'total':400}, {'id':'id2', 'hits':300, 'misses':100, 'total':500}, {'id':'id3', 'hits':100, 'misses':400, 'total':600} ] </code></pre> <p>I end up with</p> <pre><code>myHashTable = { 'id1': {'hits':200, 'misses':300, 'total':400}, 'id2': {'hits':300, 'misses':100, 'total':500}, 'id3': {'hits':100, 'misses':400, 'total':600} } </code></pre> <p>and</p> <pre><code>myLists = { 'hits': {'id1':200, 'id2':300, 'id3':100}, 'misses': {'id1':300, 'id2':100, 'id3':400}, 'total': {'id1':400, 'id2':500, 'id3':600} } </code></pre> <p>I then need to sort all of the data in each of the myLists dictionaries.</p> <p>What I doing currently is something like the following:</p> <pre><code>def doSort(key): sorted[key] = sorted(myLists[key].items(), key=operator.itemgetter(1), reverse=True) which would yield, in the case of misses: [('id3', 400), ('id1', 300), ('id2', 200)] </code></pre> <p>This works great when I have up to 100,000 records or so, but with 1,000,000 it is taking at least 5 - 10 minutes to sort each with a total of 16 (my original list of dictionaries actually has 17 fields including id which is popped)</p> <blockquote> <p><strong>* EDIT *</strong> This service is a ThreadingTCPServer which has a method allowing a client to connect and add new data. The new data may include new records (meaning dictionaries with unique 'id's to what is already in memory) or modified records (meaning the same 'id' with different data for the other key value pairs</p> <p>So, once this is running I would pass in </p> <pre><code>[ {'id':'id1', 'hits':205, 'misses':305, 'total':480}, {'id':'id4', 'hits':30, 'misses':40, 'total':60}, {'id':'id5', 'hits':50, 'misses':90, 'total':20 ] </code></pre> <p>I have been using dictionaries to store the data so that I don't end up with duplicates. After the dictionaries are updated with the new/modified data I resort each of them.</p> <p><strong>* END EDIT *</strong></p> </blockquote> <p>So, what is the best way for me to sort these? Is there a better method?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload