StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POScript for parsing a biological sequence from a public database in Python
primarykey
Id
5614180
data
AcceptedAnswerId
5614579
AnswerCount
3
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-04-10T19:33:06.023
FavoriteCount
0
LastActivityDate
2013-11-06T09:18:42.230
LastEditDate
2011-04-14T00:15:31.377
LastEditorUserId
610084
OwnerUserId
610084
ParentId
0
PostTypeId
1
Score
2
ViewCount
862
LastEditorDisplayName
text
Body
Greetings to the stackoverflow community, I am currently following a bioinformatics module as part of a biomedical degree (I am basically a python newbie) and the following task is required as part of a Python programming assignment: extract motif sequences (amino acid sequences, so basically strings in programmatic-speak, that have been excised from algorithms implementing a multiple sequence alignment and subsequently iterative database scanning to generate the best conserved sequences. The ultimate idea is to infer functional significance from such "motifs"). These motifs are stored on a public database in files which have multiple data fields corresponding to each protein (uniprot ID, Accession Number, the alignment itself stored in a hyperlink .seq file), currently one of which is of interest in this scope. The data field is called "extracted motif sets". My question is how to go about writing a script that will essentially parse the "motif strings" and output them to a file. I have now coded the script so that it looks as follows (I don't write the results to files yet): <pre><code>import os, re, sys, string printsdb = open('/users/spyros/folder1/python/PRINTSmotifs/prints41_1.kdat', 'r') protname = None final_motifs = [] for line in printsdb.readlines(): if line.startswith('gc;'): protname = line.lstrip() #string.lower(name) # convert to lowercase break def extract_final_motifs(protname): """Extracts the sequences of the 'final motifs sets' for a PRINTS entry. Sequences are on lines starting 'fd;' A simple regex is used for retrieval""" for line in printsdb.readlines(): if line.startswith('fd;'): final_motifs = re.compile('^\s+([A-Z]+)\s+<') final_motifs = final_motifs.match(line) #print(final_motifs.groups()[0]) motif_dict = {protname : final_motifs} break return motif_dict = extract_final_motifs('ADENOSINER') print(motif_dict) </code></pre> The problem now is that while my code loops over a raw database file (prints41_!.kdat) instead of connecting to the public database using urllib module, as suggested by Simon Cockell below, the ouput of the script is simply "none" on the python shell, whereas it should be creating a list such as [AAYIGIEVLI, AAYIGIEVLI, AAYIGIEVLI, etc..] Does anybody have any idea where the logic error is? Any input appreciated!! I apologize for the extensive text, I just hope to be a clear as possible. Thanks in advance for any help! 
Tags
<python><text-processing><bioinformatics><biopython>
Title
Script for parsing a biological sequence from a public database in Python
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USSpyros
UserOwnerUserId
1. USSpyros
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POScript for parsing a biological sequence from a public database in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POScript for parsing a biological sequence from a public database in Python
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COMaybe crosspost this on Biostars.org: A Bioinformatics oriented stack exchange site: http://biostar.stackexchange.com/
 singulars
 PostPostId
 POScript for parsing a biological sequence from a public database in Python
 UserUserId
 USTim
2. COThank you for the advice, will try biostars then.
 singulars
 PostPostId
 POScript for parsing a biological sequence from a public database in Python
 UserUserId
 USSpyros

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.