Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>It looks like your data in both the files are already sorted. If not, first sort them by an external tool or using Python.</p> <p>Then, you can go through the two files in parallel. You read a number from the <code>Numbers.txt</code> file, and see if it is in a range in <code>Ranges.txt</code> file, reading as many lines from that file as needed to answer that question. Then read the next number from <code>Numbers.txt</code>, and repeat. The idea is similar to merging two sorted arrays, and should run in <code>O(n+m)</code> time, <code>n</code> and <code>m</code> are the sizes of the files. If you need to sort the files, the run time is <code>O(n lg(n) + m lg(m))</code>. Here is a quick program I wrote to implement this:</p> <pre><code>import sys from collections import Counter class Gen(object): __slots__ = ('rfp', 'nfp', 'mn', 'mx', 'num', 'd', 'n') def __init__(self, ranges_filename, numbers_filename): self.d = Counter() # sum of elements keyed by range self.n = Counter() # number of elements keyed by range self.rfp = open(ranges_filename) self.nfp = open(numbers_filename) # Read the first number and the first range values self.num = float(self.nfp.next()) # Current number self.mn, self.mx = [int(x) for x in self.rfp.next().split()] # Current range def go(self): while True: if self.mx &lt; self.num: try: self.mn, self.mx = [int(x) for x in self.rfp.next().split()] except StopIteration: break else: if self.mn &lt;= self.num &lt;= self.mx: self.d[(self.mn, self.mx)] += self.num self.n[(self.mn, self.mx)] += 1 try: self.num = float(self.nfp.next()) except StopIteration: break self.nfp.close() self.rfp.close() return self.d, self.n def run(ranges_filename, numbers_filename): r = Gen(ranges_filename, numbers_filename) d, n = r.go() for mn, mx in sorted(d): s, N = d[(mn, mx)], n[(mn, mx)] if s: av = s/N else: av = 0 sys.stdout.write('%d %d %.3f\n' % (mn, mx, av)) </code></pre> <p>On files with 10,000,000 numbers in each of the files, the above runs in about 1.5 minute on my computer without the output part.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload