Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I think that @j_random_hacker and @Ashalynd are on the right track regarding using this algorithm in most Perl implementations. The datatypes you're using are going to use more memory that absolutely needed for the calculations.</p> <p>So this is "normal" in that you should expect to see this kind of memory usage for how you've written this algorithm in perl. You may have other problems in surrounding code that are using a lot of memory but this algorithm will hit your memory hard with large sequences.</p> <p>You can address some of the memory issues by changing the datatypes that you're using as @Ashalynd suggests. You could try changing the hash which holds score and pointer into an array and changing the string pointers into integer values. Something like this might get you some benefit while still maintaining readability:</p> <pre><code>use strict; use warnings; # define constants for array positions and pointer values # so the code is still readable. # (If you have the "Readonly" CPAN module you may want to use it for constants # instead although none of the downsides of the "constant" pragma apply in this code.) use constant { SCORE =&gt; 0, POINTER =&gt; 1, DIAGONAL =&gt; 0, LEFT =&gt; 1, UP =&gt; 2, NONE =&gt; 3, }; ... sub semiGlobal2 { my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = @_; # initialization: first row to 0 ; my @matrix; # score and pointer are now stored in an array # using the defined constants as indices $matrix[0][0][SCORE] = 0; # pointer value is now a constant integer $matrix[0][0][POINTER] = NONE; for ( my $j = 1 ; $j &lt;= length($seq1) ; $j++ ) { $matrix[0][$j][SCORE] = 0; $matrix[0][$j][POINTER] = NONE; } for ( my $i = 1 ; $i &lt;= length($seq2) ; $i++ ) { $matrix[$i][0][SCORE] = $GAP * $i; $matrix[$i][0][POINTER] = UP; } ... # continue to make the appropriate changes throughout the code </code></pre> <p>However, when I tested this I didn't get a huge benefit when attempting to align a 3600 char string in a 5500 char string of random data. I programmed my code to abort when it consumed more than 2GB of memory. The original code aborted after 23 seconds while the one using constants and an array instead of a hash aborted after 32 seconds.</p> <p>If you really want to use this specific algorithm I'd check out the performance of <a href="http://search.cpan.org/dist/Algorithm-NeedlemanWunsch" rel="nofollow">Algorithm::NeedlemanWunsch</a>. It doesn't look like it's very mature but it may have addressed your performance issues. Otherwise look into writing an <a href="http://search.cpan.org/dist/Inline" rel="nofollow">Inline</a> or <a href="http://perldoc.perl.org/perlxs.html" rel="nofollow">Perl XS</a> wrapper around a C implementation</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload