Note that there are some explanatory texts on larger screens.

plurals
  1. POScript for parsing a biological sequence from a public database in Python
    primarykey
    data
    text
    <p>Greetings to the stackoverflow community,</p> <p>I am currently following a bioinformatics module as part of a biomedical degree (I am basically a python newbie) and the following task is required as part of a Python programming assignment: </p> <p>extract motif sequences (amino acid sequences, so basically strings in programmatic-speak, that have been excised from algorithms implementing a multiple sequence alignment and subsequently iterative database scanning to generate the best conserved sequences. The ultimate idea is to infer functional significance from such "motifs"). </p> <p>These motifs are stored on a public database in files which have multiple data fields corresponding to each protein (uniprot ID, Accession Number, the alignment itself stored in a hyperlink .seq file), currently one of which is of interest in this scope. The data field is called "extracted motif sets". </p> <p>My question is how to go about writing a script that will essentially parse the "motif strings" and output them to a file. I have now coded the script so that it looks as follows (I don't write the results to files yet):</p> <pre><code>import os, re, sys, string printsdb = open('/users/spyros/folder1/python/PRINTSmotifs/prints41_1.kdat', 'r') protname = None final_motifs = [] for line in printsdb.readlines(): if line.startswith('gc;'): protname = line.lstrip() #string.lower(name) # convert to lowercase break def extract_final_motifs(protname): """Extracts the sequences of the 'final motifs sets' for a PRINTS entry. Sequences are on lines starting 'fd;' A simple regex is used for retrieval""" for line in printsdb.readlines(): if line.startswith('fd;'): final_motifs = re.compile('^\s+([A-Z]+)\s+&lt;') final_motifs = final_motifs.match(line) #print(final_motifs.groups()[0]) motif_dict = {protname : final_motifs} break return motif_dict = extract_final_motifs('ADENOSINER') print(motif_dict) </code></pre> <p>The problem now is that while my code loops over a raw database file (prints41_!.kdat) instead of connecting to the public database using urllib module, as suggested by Simon Cockell below, the ouput of the script is simply "none" on the python shell, whereas it should be creating a list such as [AAYIGIEVLI, AAYIGIEVLI, AAYIGIEVLI, etc..]</p> <p>Does anybody have any idea where the logic error is? Any input appreciated!! I apologize for the extensive text, I just hope to be a clear as possible. Thanks in advance for any help! </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload