Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I thought this was an interesting problem, so I put together a program based on considering 'foldings', which scans outward for possible symmetrical matches from different 'fold points'. If N is the number of nucleotides and M is 'maxInterval-minInterval', you should have running time O(N*M). I may have missed some boundary cases, so use the code with care, but it does work for the example provided. Note that I've used a padded intermediate buffer to store the genome, as this reduces the number of comparisons for boundary cases required in the inner loops; this trades off additional memory allocation for better speed. Feel free to edit the post if you make any corrections or improvements.</p> <pre><code>class Program { public sealed class Pairing { public int Index { get; private set; } public int Length { get; private set; } public int Offset { get; private set; } public Pairing(int index, int length, int offset) { Index = index; Length = length; Offset = offset; } } public static IEnumerable&lt;Pairing&gt; FindPairings(string genome, int minLen, int maxLen, int intervalMinLen, int intervalMaxLen) { int n = genome.Length; var padding = new string((char)0, maxLen); var padded = string.Concat(padding, genome, padding); int start = (intervalMinLen + minLen)/2 + maxLen; int end = n - (intervalMinLen + minLen)/2 + maxLen; //Consider 'fold locations' along the genome for (int i=start; i&lt;end; i++) { //Consider 'odd' folding (centered on index) about index i int k = (intervalMinLen+2)/2; int maxK = (intervalMaxLen + 2)/2; while (k&lt;=maxK) { int matchLength = 0; while (IsPaired(padded[i - k], padded[i + k]) &amp;&amp; (k &lt;= (maxK+maxLen))) { matchLength++; if (matchLength &gt;= minLen &amp;&amp; matchLength &lt;= maxLen) { yield return new Pairing(i-k - maxLen, matchLength, 2*k - (matchLength-1)); } k++; } k++; } //Consider 'even' folding (centered before index) about index i k = (intervalMinLen+1)/2; while (k &lt;= maxK) { int matchLength = 0; while (IsPaired(padded[i - (k+1)], padded[i + k]) &amp;&amp; (k&lt;=maxK+maxLen)) { matchLength++; if (matchLength &gt;= minLen &amp;&amp; matchLength &lt;= maxLen) { yield return new Pairing(i - (k+1) - maxLen, matchLength, 2*k + 1 - (matchLength-1)); } k++; } k++; } } } private const int SumAT = 'A' + 'T'; private const int SumGC = 'G' + 'C'; private static bool IsPaired(char a, char b) { return (a + b) == SumAT || (a + b) == SumGC; } static void Main(string[] args) { string genome = "ATCAGGACCATACGCCTGAT"; foreach (var pairing in FindPairings(genome, 4, 5, 9, 10)) { Console.WriteLine("'{0}' pair with '{1}'", genome.Substring(pairing.Index, pairing.Length), genome.Substring(pairing.Index + pairing.Offset, pairing.Length)); } Console.ReadKey(); } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload