Note that there are some explanatory texts on larger screens.

plurals
  1. PORearrange sequences in FASTA format file by length?
    text
    copied!<p>What sort of algorithm should be used to rearrange the FASTA sequences into length order (shortest first)? It needs to sort the sequences into length order, but with all the information displayed, not just the lengths.</p> <p>I can sort the 'length' of the sequences using <code>Bio::FastaFormat#length</code>, put lengths into an array, then sort:</p> <pre><code>require 'rubygems' require 'bio' file = Bio::FastaFormat.open(ARGV.shift) seqarray = [] file.each do |seq| a = seq.length seqarray.push a end puts seqarray.sort </code></pre> <p>This displays the sequence lengths in order, but what I need to be able to see is the original FASTA format, in length order.</p> <p>I can't add the <code>seq.length</code> (length of each sequence) to the <code>seq.entry</code> (entire fasta format) then sort, because <code>seq.length</code> is an integer and <code>seq.entry</code> gives strings. I tried converting <code>seq.length.to_s</code>, adding this to <code>seq.entry</code>, then sorting. This is the closest I've got, unfortunately the lengths are in a string so they order <code>1,11,111</code> instead of <code>1,2,3</code> etc.:</p> <pre><code>require 'rubygems' require 'bio' file = Bio::FastaFormat.open(ARGV.shift) seqarray = [] file.each do |seq| a = (seq.length).to_s + ' = length' + seq.entry seqarray.push a end puts seqarray.sort </code></pre> <p>After doing this I tried the above using the <code>sequence_id</code> instead of the entire entry, and not converting the length to strings, but the <code>id</code> has letters in it, so I can't add to the length integers without getting an error message.</p> <p>So yeah, any suggestions?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload