Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>some python to the rescue.</p> <p>I created two text-files using the following snippet:</p> <pre><code>#!/usr/bin/env python import random import string N=2000000 for i in range(N): s = ''.join(random.choice(string.ascii_uppercase + string.digits) for x in range(40)) print s + '|4|2|20705|9550|21419' </code></pre> <p>one 300k and one 2M lines This gives me the following files:</p> <pre><code>$ ll -rwxr-xr-x 1 210 Jun 11 22:29 gen_random_string.py* -rw-rw-r-- 1 119M Jun 11 22:31 large.txt -rw-rw-r-- 1 18M Jun 11 22:29 small.txt </code></pre> <p>Then I appended a line from small.txt to the end of large.txt so that I had a matching pattern</p> <p>Then some more python:</p> <pre><code>#!/usr/bin/env python target = {} with open("large.txt") as fd: for line in fd: target[line.split('|')[0]] = line.strip() with open("small.txt") as fd: for line in fd: if line.split('|')[0] in target: print target[line.split('|')[0]] </code></pre> <p>Some timings:</p> <pre><code>$ time ./comp.py 3A8DW2UUJO3FYTE8C5ESE25IC9GWAEJLJS2N9CBL|4|2|20705|9550|21419 real 0m2.574s user 0m2.400s sys 0m0.168s $ time awk -F"|" 'NR==FNR{a[$1]=$2;next}{if (a[$1]) print}' small.txt large.txt 3A8DW2UUJO3FYTE8C5ESE25IC9GWAEJLJS2N9CBL|4|2|20705|9550|21419 real 0m4.380s user 0m4.248s sys 0m0.124s </code></pre> <p>Update:</p> <p>To conserve memory, do the dictionary-lookup the other way</p> <pre><code>#!/usr/bin/env python target = {} with open("small.txt") as fd: for line in fd: target[line.split('|')[0]] = line.strip() with open("large.txt") as fd: for line in fd: if line.split('|')[0] in target: print line.strip() </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload