Note that there are some explanatory texts on larger screens.

plurals
  1. POPython: Zope's BTree OOSet, IISet, etc... Effective for this requirement?
    primarykey
    data
    text
    <p>I asked another question: <a href="https://stackoverflow.com/questions/1180240/best-way-to-sort-1m-records-in-python" title="StackOverflow Python Sorting question">https://stackoverflow.com/questions/1180240/best-way-to-sort-1m-records-in-python</a> where I was trying to determine the best approach for sorting 1 million records. In my case I need to be able to add additional items to the collection and have them resorted. It was suggested that I try using Zope's BTrees for this task. After doing some reading I am a little stumped as to what data I would put in a set. </p> <p>Basically, for each record I have two pieces of data. 1. A unique ID which maps to a user and 2. a value of interest for sorting on.</p> <p>I see that I can add the items to an OOSet as tuples, where the value for sorting on is at index 0. So, <code>(200, 'id1'),(120, 'id2'),(400, 'id3')</code> and the resulting set would be sorted with <code>id2, id1 and id3</code> in order.</p> <p>However, part of the requirement for this is that each id appear only once in the set. I will be adding additional data to the set periodically and the new data may or may not include duplicated 'ids'. If they are duplicated I want to update the value and not add an additional entry. So, based on the tuples above, I might add <code>(405, 'id1'),(10, 'id4')</code> to the set and would want the output to have <code>id4, id2, id3, id1</code> in order.</p> <p>Any suggestions on how to accomplish this. Sorry for my newbness on the subject. </p> <p><strong>* EDIT - additional info *</strong></p> <p>Here is some actual code from the project:</p> <pre><code>for field in lb_fields: t = time.time() self.data[field] = [ (v[field], k) for k, v in self.foreign_keys.iteritems() ] self.data[field].sort(reverse=True) print "Added %s: %03.5f seconds" %(field, (time.time() - t)) </code></pre> <p>foreign_keys is the original data in a dictionary with each id as the key and a dictionary of the additional data as the value. data is a dictionary containing the lists of sorted data.</p> <p>As a side note, as each itereation of the for field in lb_fields runs, the time to sort increases - not by much... but it is noticeable. After 1 million records have been sorted for each of the 16 fields it is using about 4 Gigs or RAM. Eventually this will run on a machine with 48 Gigs. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload