Note that there are some explanatory texts on larger screens.

plurals
  1. POExtracting nucleotide sequence from a non-standardly formatted text file
    primarykey
    data
    text
    <p>I have been given some DNA sequences by collaborators in a word document that I'd like to convert into a series of fasta sequences in one file.</p> <p>I've made it into a text file and I figured that using regular expressions to extract the gene name and the sequence:</p> <pre><code>use warnings; use strict; die "usage: make_fasta.pl &lt;sequence file&gt;" unless (@ARGV == 1); my $seq_filename = shift; my $fasta_db_name = $seq_filename . "_db.fa"; open(my $seq_file, '&lt;', $seq_filename) or die "can't open file $seq_filename, $!"; open(my $fasta_file, '&gt;', $fasta_db_name) or die "can't open file $fasta_db_name, $!"; while (my $line = &lt;$seq_file&gt;) { chomp $line; if ($line =~ /^[ATCG]+$/) { # if the line is entirely DNA seqence print $fasta_file "$line\n"; } elsif ($line =~ /Full-length (\w+) cDNA/) { # if the line has gene info print $fasta_file "&gt;$1\n"; } else { next; } } </code></pre> <p>But that just gave me the name of the first gene. Clearly I've done something wrong with the DNA regular expression but I can't for the life of me work it out. To my eyes it's exactly the same as other suggested DNA tests I've found on this site and others.</p> <p>The file I'm trying to parse is configured like so:</p> <pre><code>Collaborators name title of gene set Full-length clock cDNA coding sequence ATGGTAGGATGTGTAATGCGTACGTGATCGT Full-length per cDNA coding sequence ATGCTAGCTACGTACGTAGCTACGTAGTACG </code></pre> <p>I want the output to be a fasta file so:</p> <pre><code>&gt;clock ATGGTAGGATGTGTAATGCGTACGTGATCGT &gt;per ATGCTAGCTACGTACGTAGCTACGTAGTACG </code></pre> <p>The first few lines of the actual input file are:</p> <pre><code>Dr Lin Zhang (Leicester University 10/2012) Canonical clock genes Full-length per cDNA coding seq (3693bp) ATGGACACAGGAACACCCCATGAAGATGTGCCCTCAGAGGACCACACCTTGGAAGAAGGGGACAGCAAGAACCCCTCGTGCCAGCAAGAGTCAGCCTACGGCTCCCTCGAGTCATCCTCCAATGGACAGTCTCAGAAAAGTTTCGGAGGAAGTGGAAGCAAAAGCTTAAATAGTGGTTCGAGTCACAGCAGCGGCTTTGGGGACCAAAATGATTTCAAGGGTATCCATCTTCACGAAGCGAAACACATAGCGTTGAAGAAGAAGAAAACTGGGAAAGGAGGTGAAAAGGTAGCAGAAATCCCCTTTCAAACTGCCTCTGAGGCAGAACTGTCCTCCAAAGGAAACGAAACAGAAAAGGAGAAAGAAACAAGCCTCGAGGAGTCTCCTGCTGCAAAAGAGGAAGCAATTATCGAAAAGGAGTCTCGTTACATCCACCCGAGGAACT </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload