Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You can do the three fields in an awk one-liner. Here's a proof:</p> <pre><code>[ghoti@pc ~]$ cat file1 chr10 1000 1001 DEL 2.4807 chr10 7443 8978 chr10 1005 1008 DEL 1.2799 chr10 7321 8778 [ghoti@pc ~]$ cat file2 chr13 3456 6746 chr10 7443 8978 chr13 6453 8767 chr10 7321 8778 [ghoti@pc ~]$ awk 'NR == FNR { what[$(NF-2),$(NF-1),$(NF)] = $0; next; } { printf("%s %s\n", what[$(NF-2),$(NF-1),$(NF)], $0); }' file1 file2 chr10 1000 1001 DEL 2.4807 chr10 7443 8978 chr13 3456 6746 chr10 7443 8978 chr10 1005 1008 DEL 1.2799 chr10 7321 8778 chr13 6453 8767 chr10 7321 8778 [ghoti@pc ~]$ </code></pre> <p>If you want the files in the other order, just change the order of <code>$0</code> and <code>what[]</code> in the printf().</p> <p>Note that this assumes you're okay with loading the entire contents of the first file into an array in memory. Probably shouldn't be used for files with millions of lines, but that'll depend entirely on the system you're running it on.</p> <p><strong>How does this work?</strong></p> <p>The awk script has two main sections, each in curly braces. The first section ONLY runs if NR (the current record number of all data read so far) matches FNR (the record number in the current file). In other words, it acts on only the first file. The first file gets loaded into memory in an associative array whose subscript is the last three fields of the line.</p> <p>The second section acts on every subsequent file after the first. It simply prints the current line, but prepends it with the content of the array (matched in the first section) that matches the last three fields of the current line.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload