Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Obviously you need to run each line from Numbers.txt against each line from Ranges.txt.</p> <p>You could just iterate over Numbers.txt, and, for each line, iterate over Ranges.txt. But this will take forever, reading the whole Ranges.txt file millions of times.</p> <p>You could read both of them into memory, but that will take a lot of storage, and it means you won't be able to do any processing until you've finished reading and preprocessing both files.</p> <p>So, what you want to do is read Ranges.txt into memory once and store it as, say, a list of pairs of ints instead, but read Numbers.txt lazily, iterating over the list for each number.</p> <p>This kind of thing comes up all the time. In general, you want to make the bigger collection into the outer loop, and make it as lazy as possible, while the smaller collection goes into the inner loop, and is pre-processed to make it as fast as possible. But if the bigger collection can be preprocessed more efficiently (and you have enough memory to store it!), reverse that.</p> <hr> <p>And speaking of preprocessing, you can do a lot better than just reading into a list of pairs of ints. If you sorted Ranges.txt, you could find the closest range without going over by bisecting then just check that (18 steps), instead of checking each range exhaustively (100000 steps).</p> <p>This is a bit of a pain with the stdlib, because it's easy to make off-by-one errors when using <a href="http://docs.python.org/3/library/bisect.html" rel="nofollow"><code>bisect</code></a>, but there are plenty of ActiveState recipes to make it easier (including one <a href="http://code.activestate.com/recipes/577197-sortedcollection/" rel="nofollow">linked from the official docs</a>), not to mention third-party modules like <a href="https://pypi.python.org/pypi/blist/" rel="nofollow"><code>blist</code></a> or <a href="https://pypi.python.org/pypi/bintrees/" rel="nofollow"><code>bintrees</code></a> that give you a sorted collection in a simple OO interface.</p> <hr> <p>So, something like this pseudocode:</p> <pre><code>with open('ranges.txt') as f: ranges = sorted([map(int, line.split()) for line in f]) range_values = {} with open('numbers.txt') as f: rows = (map(int, line.split()) for line in f) for number, value in rows: use the sorted ranges to find the appropriate range (if any) range_values.setdefault(range, []).append(value) with open('output.txt') as f: for r, values in range_values.items(): mean = sum(values) / len(values) f.write('{} {} {}\n'.format(r[0], r[1], mean)) </code></pre> <hr> <p>By the way, if the parsing turns out to be any more complicated than just calling <code>split</code> on each line, I'd suggest using the <code>csv</code> module… but it looks like that won't be a problem here.</p> <hr> <p>What if you can't fit Ranges.txt into memory, but can fit Numbers.txt? Well, you can sort that, then iterate over Ranges.txt, find all of the matches in the sorted numbers, and write the results out for that range.</p> <p>This is a bit more complicated, because it you have to bisect_left and bisect_right and iterate everything in between. But that's the only way in which it's any harder. (And here, a third-party class will help even more. For example, with a <code>bintrees.FastRBTree</code> as your sorted collection, it's just <code>sorted_number_tree[low:high]</code>.)</p> <hr> <p>If the ranges can overlap, you need to be a bit smarter—you have to find the closest range without going over the start, and the closest range without going under the end, and check everything in between. But the main trick there is the exact same one used for the last version. The only other trick is to keep two copies of ranges, one sorted by the start value and one by the end, and you'll need to have one of them be a map to indices in the other instead of just a plain list.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload