Note that there are some explanatory texts on larger screens.

plurals
  1. PORemoving unwanted characters in each line of a file then matching what is left to another file in Python
    primarykey
    data
    text
    <p>I would like to write a python script that addresses the following problem:</p> <p>I have two tab separated files, one has just one column of a variety of words. The other file has one column that contains similar words, as well as columns other information. However, within the first file, some lines contain multiple words, separated by " /// ". The other file has a similar problem, but the separator is " | ". </p> <p><strong>File #1</strong></p> <pre><code>RED BLUE /// GREEN YELLOW /// PINK /// PURPLE ORANGE BROWN /// BLACK </code></pre> <p><strong>File #2</strong> (Which contains additional columns of other measurements)</p> <pre><code>RED|PINK ORANGE BROWN|BLACK|GREEN|PURPLE YELLOW|MAGENTA </code></pre> <p>I want to parse through each file and match the words that are the same, and then append the columns of additional measurements too. But I want to ignore the <code>///</code> in the first file, and the <code>|</code> in the second, so that each word will be compared to the other list on its own. The output file should have just one column of any words that appear in both lists, and then the appended additional information from file 2. Any help?? </p> <hr> <p><strong>Addition info / update:</strong></p> <p>Here are 8 lines of File #1, I used color names above to make it more simple but this is what the words really are: These are the "symbols":</p> <pre><code>ANKRD38 ANKRD57 ANKRD57 ANXA8 /// ANXA8L1 /// ANXA8L2 AOF1 AOF2 AP1GBP1 APOBEC3F /// APOBEC3G </code></pre> <p>Here is one line of file #2: What I need to do is run each symbol from file1 and see if it matches with any one of the "synonyms", found in file2, in column 5 (here the synonyms are A1B|ABG|GAP|HYST2477). If any symbols from file1 match ANY of the synonyms from col 5 file 2, then I need to append the additional information (the other columns in file2) onto the symbol in file1 and create one big output file.</p> <pre><code>9606 '\t' 1 '\t' A1BG '\t' - '\t' A1B|ABG|GAB|HYST2477'\t' HGNC:5|MIM:138670|Ensembl:ENSG00000121410|HPRD:00726 '\t' 19 '\t' 19q13.4'\t' alpha-1-B glycoprotein '\t' protein-coding '\t' A1BG'\t' alpha-1-B glycoprotein'\t' O '\t' alpha-1B-glycoprotein '\t' 20120726 </code></pre> <p>File2 is 22,000 KB, file 1 is much smaller. I have thought of creating a dict much like has been suggested, but I keep getting held up with the different separators in each of the files. Thank you all for questions and help thus far.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload