Note that there are some explanatory texts on larger screens.

plurals
  1. POoptimizing this script to match lines of one txt file with another
    primarykey
    data
    text
    <p>Okay so I am at best a novice in bash scripting but I wrote this very small script late last night to take the first 40 character's of each line of a fairly large text file (~300,000 lines) and search through a much larger text file for matches (~2.2 million lines), and then output all of the results into the matching lines into an new text file. </p> <p>so the script looks like this:</p> <pre><code>#!/bin/bash while read -r line do match=${line:0:40} grep "$match" large_list.txt done &lt;"small_list.txt" </code></pre> <p>and then by calling the script like so </p> <pre><code>$ bash my_script.sh &gt; outputfile.txt &amp; </code></pre> <p>and this gives me all the common elements between the 2 list's. Now this is all well and good and slowly works. but I am running this on a m1.smalll ec2 instance and fair enough (the proccesing on this is shit and I could spin up a larger instance to handle all this or do it on my desktop and upload the file). However I would rather learn a more efficentr way of accomplishing the same task, However I can't quite seem to figure this out. Any tidbits of how to best go about this , or complete the task more efficently would really be very very appreciated</p> <p>to give you an idea of how slow this is working i started the script about 10 hours ago and I am about 10% of the way through all the matches. </p> <p>Also I am not set in using bash so scripts in other language's are fair game .. I figure the pro's on S.O. can easily improve my rock for a hammer aproach</p> <p>edit: adding input and output's and morre information about the data</p> <pre><code> input: (small text file) 8E636C0B21E42A3FC6AA3C412B31E3C61D8DD062|Vice S01E09 HDTV XviD-FUM[ettv]|Video TV|http://bitsnoop.com/vice-s01e09-hdtv-xvid-fum-ettv-q49614889.html|http://torrage.com/torrent/36A02E282D49EB7D94ACB798654829493CA929CB.torrent 3B9403AD73124A84AAE12E83A2DE446149516AC3|Sons of Guns S04E08 HDTV XviD-FUM[ettv]|Video TV|http://bitsnoop.com/sons-of-guns-s04e08-hdtv-xvid-fum-e-q49613491.html|http://torrage.com/torrent/3B9403AD73124A84AAE12E83A2DE446149516AC3.torrent C4ADF747050D1CF64E9A626CA2563A0B8BD856E7|Save Me S01E06 HDTV XviD-FUM[ettv]|Video TV|http://bitsnoop.com/save-me-s01e06-hdtv-xvid-fum-ettv-q49515711.html|http://torrage.com/torrent/C4ADF747050D1CF64E9A626CA2563A0B8BD856E7.torrent B71EFF95502E086F4235882F748FB5F2131F11CE|Da Vincis Demons S01E08 HDTV x264-EVOLVE|Video TV|http://bitsnoop.com/da-vincis-demons-s01e08-hdtv-x264-e-q49515709.html|http://torrage.com/torrent/B71EFF95502E086F4235882F748FB5F2131F11CE.torrent match against (large text file) 86931940E7F7F9C1A9774EA2EA41AE59412F223B|0|0 8E636C0B21E42A3FC6AA3C412B31E3C61D8DD062|4|2|20705|9550|21419 ADFA5DD6F0923AE641F97A96D50D6736F81951B1|0|0 CF2349B5FC486E7E8F48591EC3D5F1B47B4E7567|1|0|429|428|22248 290DF9A8B6EC65EEE4EC4D2B029ACAEF46D40C1F|1|0|523|446|14276 C92DEBB9B290F0BB0AA291114C98D3FF310CF0C3|0|0|21448 Output: 8E636C0B21E42A3FC6AA3C412B31E3C61D8DD062|4|2|20705|9550|21419 </code></pre> <p>additional clarifications: so Basically there is a hash which is first 40 charecter's of the input file (a file I have already reduced size to about 15% of original, SO for each line in this file there is a hash in the larger text file (that I am matching against) with some corresponding information now it is the line in the larger file that I would like to write to a new file so that in the end I have a 1:1 ratio of all thing in smaller text file to my output_file.txt In this case I am showing the first line of the input being matched (line 2 of larger file)and then written to an output file</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload