StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PORemoving unwanted characters in each line of a file then matching what is left to another file in Python
primarykey
Id
14288669
data
AcceptedAnswerId
14288877
AnswerCount
3
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-01-12T00:07:43.850
FavoriteCount
0
LastActivityDate
2013-02-15T00:09:51.363
LastEditDate
2013-02-15T00:09:51.363
LastEditorUserId
1971382
OwnerUserId
1971382
ParentId
0
PostTypeId
1
Score
0
ViewCount
384
LastEditorDisplayName
text
Body
I would like to write a python script that addresses the following problem: I have two tab separated files, one has just one column of a variety of words. The other file has one column that contains similar words, as well as columns other information. However, within the first file, some lines contain multiple words, separated by " /// ". The other file has a similar problem, but the separator is " | ". File #1 <pre><code>RED BLUE /// GREEN YELLOW /// PINK /// PURPLE ORANGE BROWN /// BLACK </code></pre> File #2 (Which contains additional columns of other measurements) <pre><code>RED|PINK ORANGE BROWN|BLACK|GREEN|PURPLE YELLOW|MAGENTA </code></pre> I want to parse through each file and match the words that are the same, and then append the columns of additional measurements too. But I want to ignore the <code>///</code> in the first file, and the <code>|</code> in the second, so that each word will be compared to the other list on its own. The output file should have just one column of any words that appear in both lists, and then the appended additional information from file 2. Any help?? <hr> Addition info / update: Here are 8 lines of File #1, I used color names above to make it more simple but this is what the words really are: These are the "symbols": <pre><code>ANKRD38 ANKRD57 ANKRD57 ANXA8 /// ANXA8L1 /// ANXA8L2 AOF1 AOF2 AP1GBP1 APOBEC3F /// APOBEC3G </code></pre> Here is one line of file #2: What I need to do is run each symbol from file1 and see if it matches with any one of the "synonyms", found in file2, in column 5 (here the synonyms are A1B|ABG|GAP|HYST2477). If any symbols from file1 match ANY of the synonyms from col 5 file 2, then I need to append the additional information (the other columns in file2) onto the symbol in file1 and create one big output file. <pre><code>9606 '\t' 1 '\t' A1BG '\t' - '\t' A1B|ABG|GAB|HYST2477'\t' HGNC:5|MIM:138670|Ensembl:ENSG00000121410|HPRD:00726 '\t' 19 '\t' 19q13.4'\t' alpha-1-B glycoprotein '\t' protein-coding '\t' A1BG'\t' alpha-1-B glycoprotein'\t' O '\t' alpha-1B-glycoprotein '\t' 20120726 </code></pre> File2 is 22,000 KB, file 1 is much smaller. I have thought of creating a dict much like has been suggested, but I keep getting held up with the different separators in each of the files. Thank you all for questions and help thus far.
Tags
<python><matching>
Title
Removing unwanted characters in each line of a file then matching what is left to another file in Python
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USC9r1y
UserOwnerUserId
1. USC9r1y
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COYour problem seems like a general programming puzzle. At least show us some ideas/code you have. We will guide you the right way.
 singulars
 PostPostId
 PORemoving unwanted characters in each line of a file then matching what is left to another file in Python
 UserUserId
 USsupertopi
2. COThis isn't all that tough, but you need to consider things like what happens when a word in file 1 matches several lines in file 2 - do you append the additional columns? Merge in some way? Just use the latest one found? I'm sure there are plenty of people here who can give you basic tips, but if you're after complete code there are some more details needed. Two (small) example files and the output from them would be a start.
 singulars
 PostPostId
 PORemoving unwanted characters in each line of a file then matching what is left to another file in Python
 UserUserId
 USCartroo

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.