Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Rather than splitting the sequence into three parts, the way I see this working is to find all occurrences of <code>$pattern</code> in the complete sequence and determine in which third the pattern starts.</p> <p>The built-in variable <code>$-[0]</code> contains the offset of the start of the most recent successful match.</p> <p>The code below does what I think you want. It works by accumulating each sequence (which ends either when a new sequence ID is found or the end of file is reached) and passing it to the <code>process_seq</code> subroutine.</p> <p>The subroutine takes the length of the sequence and caclulates the offset of the end of each third of the string. The idiomatic <code>sprintf '%.0f', $value</code> is used to round fractional values to the nearest character position.</p> <p>The <code>@counts</code> array is adjusted for each occurrence of <code>$regex</code> in the sequence. The element of <code>@counts</code> to be incremented is established by comparing the starting position of the match in <code>$-[0]</code> with the end offset of each of the three segments of the sequence.</p> <p>Once each sequence has been processed the values in <code>@counts</code> are accumulated into <code>@totals</code> to give overall figures for all sequences.</p> <p>The output of the program when using your sample data is shown. The grand total is <code>(9, 1, 6)</code>.</p> <pre><code>use strict; use warnings; my $gpat = '[G]{3,5}'; my $npat = '[A-Z]{1,25}'; my $pattern = $gpat.$npat.$gpat.$npat.$gpat.$npat.$gpat; my $regex = qr/$pattern/i; open my $fh, '&lt;', 'sequences.txt' or die $!; my ($id, $seq); my @totals = (0, 0, 0); while (&lt;$fh&gt;) { chomp; if (/^&gt;(\w+)/) { process_seq($seq) if $id; $id = $1; $seq = ''; print "$id\n"; } elsif ($id) { $seq .= $_; process_seq($seq) if eof; } } print "Total: @totals\n"; sub process_seq { my $sequence = shift; my $length = length $sequence; my @offsets = map {sprintf '%.0f', $length * $_ / 3} 1..3; my @counts = (0, 0, 0); while ($sequence =~ /$regex/g) { my $place = $-[0]; for my $i (0..2) { next if $place &gt;= $offsets[$i]; $counts[$i]++; last; } } print "@counts\n\n"; $totals[$_] += $counts[$_] for 0..2; } </code></pre> <p><strong>output</strong></p> <pre><code>NR_037701 0 0 1 NM_198399 1 0 0 NR_026816 1 0 1 NR_027917 0 0 0 NR_002777 0 0 0 NR_033769 1 0 0 NM_016326 1 0 1 NM_181641 1 0 1 NM_001144931 0 0 0 NR_029429 0 1 0 NR_026551 1 0 0 NM_181640 1 0 1 NM_016951 1 0 1 NR_002773 1 0 0 NR_037806 0 0 0 Total: 9 1 6 </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload