Note that there are some explanatory texts on larger screens.

plurals
  1. POUnix join produces inconsistent results on Windows 7
    primarykey
    data
    text
    <p>Have a data set where the max number of records in one file is ~ 130,000.</p> <p>Here is a subset of the first file, 1.txt:</p> <pre><code>CID|UID|Key|sis_URL 1|D000108|RDHQFKQIGNGIED|http://sis.gov/regno=0000870779 1|D000108|RDHQFKQIGNGIED|http://sis.gov/regno=0014992622 1|D000644|RDHQFKQIGNGIED|http://sis.gov/regno=0000870779 1|D000644|RDHQFKQIGNGIED|http://sis.gov/regno=0014992622 1|D002331|RDHQFKQIGNGIED|http://sis.gov/regno=0000870779 1|D002331|RDHQFKQIGNGIED|http://sis.gov/regno=0014992622 11|C024565|WSLDOOZREJYCGB|http://sis.gov/regno=0000107062 13|C009947|PBKONEOXTCPAFI|http://sis.gov/regno=0000120821 13|C009947|PBKONEOXTCPAFI|http://sis.gov/regno=0063697187 </code></pre> <p>Here is a subset of the second file, 2.txt:</p> <pre><code>CID|bro_URL 11|http://bro.gov/nmbr=0149 13|http://bro.gov/nmbr=0119 </code></pre> <p>Am running gnuwin32 under Windows 7, 64 bit with 8gb memory; therefore need to use double quote for windows. The join command is:</p> <pre><code>join -t"|" -1 1 -2 1 -a1 -a2 -e "NULL" -o "0,1.2,1.3,1.4,2.2" 1.txt 2.txt &gt; 3_.txt </code></pre> <p>Here is the output file, 3.txt.</p> <pre><code>CID|UID|Key|sis_URL|bro_URL 1|D000108|RDHQFKQIGNGIED|http://sis.gov/regno=0000870779|NULL 1|D000108|RDHQFKQIGNGIED|http://sis.gov/regno=0014992622|NULL 1|D000644|RDHQFKQIGNGIED|http://sis.gov/regno=0000870779|NULL 1|D000644|RDHQFKQIGNGIED|http://sis.gov/regno=0014992622|NULL 1|D002331|RDHQFKQIGNGIED|http://sis.gov/regno=0000870779|NULL 1|D002331|RDHQFKQIGNGIED|http://sis.gov/regno=0014992622|NULL 11|NULL|NULL|NULL|http://bro.gov/nmbr=0149 13|NULL|NULL|NULL|http://bro.gov/nmbr=0119 11|C024565|WSLDOOZREJYCGB|http://sis.gov/regno=0000107062|NULL 13|C009947|PBKONEOXTCPAFI|http://sis.gov/regno=0000120821|NULL 13|C009947|PBKONEOXTCPAFI|http://sis.gov/regno=0063697187|NULL </code></pre> <p>For CID:11 and CID:13, I am expecting:</p> <pre><code>11|C024565|WSLDOOZREJYCGB|http://sis.gov/regno=0000107062|http://bro.gov/nmbr=0149 13|C009947|PBKONEOXTCPAFI|http://sis.gov/regno=0000120821|http://bro.gov/nmbr=0119 13|C009947|PBKONEOXTCPAFI|http://sis.gov/regno=0063697187|http://bro.gov/nmbr=0119 </code></pre> <p>Why does the <code>join</code> on CID:11 and CID:13 fail?</p> <p>Note: before posting this question I ran the subset above and produced the proper results. When I run the complete set, I get the improper result (the subset shown here).</p> <p>Any idea why? Any recommended alternative?</p> <p>When I've completed the <code>join</code> process, my final table will be 15 columns wide. But I'm already stymied at column 4.</p> <p>Any proposed work-around, such as <code>awk</code>?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload