Note that there are some explanatory texts on larger screens.

plurals
  1. POParse a fasta sequence file to retrieve Title and Sequence in Python
    primarykey
    data
    text
    <p>I have to make a generic parser for parsing fasta files using Python.</p> <p>The format is like:</p> <pre><code>&gt;gi|348686675|gb|JH159151.1| Phytophthora sojae unplaced genomic scaffold PHYSOscaffold_1, whole genome shotgun sequence TACGAGAATAATTTCTCATCATCCAGCTTTAACACAAAATTCGCA &gt;gi|348686675|gb|JH159151.1| Phytophthora sojae unplaced genomic scaffold PHYSOscaffold_2, whole genome shotgun sequence CAGTTTTCGTTAAGAGAACTTAACATTTTCTTATGACGTAAATGA AGTTTATATATAAATTTCCTTTTTATTGGA &gt;gi|348686675|gb|JH159151.1| Phytophthora sojae unplaced genomic scaffold PHYSOscaffold_3, whole genome shotgun sequence GAACTTAACATTTTCTTATGACGTAAATGAAGTTTATATATAAATTTCCTTTTTATTGGA TAATATGCCTATGCCGCATAATTTTTATATCTTTCTCCTAACAAAACATTCGCTTGTAAA </code></pre> <p>I have to retrieve each title and sequence separately and insert the values in my created MySQL database.</p> <pre><code>eg: title1 = PHYSOscaffold_1 sequence2 = TACGAGAATAATTTCTCATCATCCAGCTTTAACACAAAATTCGCA title2 = PHYSOscaffold_2 sequence1 = CAGTTTTCGTTAAGAGAACTTAACATTTTCTTATGACGTAAATGA AGTTTATATATAAATTTCCTTTTTATTGGA </code></pre> <p>and so on... I the insert these values into a MySQL table.</p> <p>The output of my parse should be like:</p> <pre><code>name1 \t sequence1 \t length_of_sequence \t a_count \t t_count \t g_count \t c_count name2 \t sequence2 \t length_of_sequence \t a_count \t t_count \t g_count \t c_count </code></pre> <p>So far, I have written a very basic script like this:</p> <pre><code>infile = open("simple.fasta") line = infile.readline() if not line.startswith("&gt;"): raise TypeError("Not a FASTA file: %r" % line) title = line sequence_lines = [] while 1: line = infile.readline().rstrip() if line == "": break sequence_lines.append(line) </code></pre> <p>I am only getting my first sequence and title.</p> <p>I am a novice and need expert help.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload