Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Your problem is recursi ... you know what, forget about recursion! =p it wouldn't really work well in PHP and the algorithm is pretty clear without it as well.</p> <pre><code> function find_repeating_sequences($s) { $res = array(); while ($s) { $i = 1; $pat = $s[0]; while (false !== strpos($s, $pat, $i)) { $res[$pat] = 1; // expand pattern and try again $pat .= $s[$i++]; } // move the string forward $s = substr($s, 1); } return array_keys($res); } </code></pre> <p>Out of interest, I wrote <a href="https://stackoverflow.com/a/13874927/1338292">Tim's answer</a> in PHP as well:</p> <pre><code>function find_repeating_sequences_re($s) { $res = array(); preg_match_all('/(?=(.+).*\1)/', $s, $matches); foreach ($matches[1] as $match) { $length = strlen($match); if ($length &gt; 1) { for ($i = 0; $i &lt; $length; ++$i) { for ($j = $i; $j &lt; $length; ++$j) { $res[substr($match, $i, $j - $i + 1)] = 1; } } } else { $res[$match] = 1; } } return array_keys($res); } </code></pre> <p>I've let them fight it out in a small benchmark of 800 bytes of random data:</p> <pre><code>$data = base64_encode(openssl_random_pseudo_bytes(600)); </code></pre> <p>Each code is run for 10 rounds and the execution time is measured. The results?</p> <pre><code>Pure PHP - 0.014s (10 runs) PCRE - 40.86s &lt;-- ouch! </code></pre> <p>It gets weirder when you look at 24k bytes (or anything above 1k really):</p> <pre><code>Pure PHP - 4.565s (10 runs) PCRE - 0.232s &lt;-- WAT?! </code></pre> <p>It turns out that the regular expression broke down after 1k characters and so the <code>$matches</code> array was empty. These are my .ini settings:</p> <pre><code>pcre.backtrack_limit =&gt; 1000000 =&gt; 1000000 pcre.recursion_limit =&gt; 100000 =&gt; 100000 </code></pre> <p>It's not clear to me how a backtrack or recursion limit would have been hit after only 1k of characters. But even if those settings are "fixed" somehow, the results are still obvious, PCRE doesn't seem to be the answer.</p> <p>I suppose writing this in C would speed it up somewhat, but I'm not sure to what degree.</p> <p><strong>Update</strong></p> <p>With some help from <a href="https://stackoverflow.com/a/14002655/1338292">hakre's answer</a> I put together an improved version that increases performance by ~18% after optimizing the following:</p> <ol> <li><p>Remove the <code>substr()</code> calls in the outer loop to advance the string pointer; this was a left over from my previous recursive incarnations.</p></li> <li><p>Use the partial results as a positive cache to skip <code>strpos()</code> calls inside the inner loop.</p></li> </ol> <p>And here it is, in all its glory (:</p> <pre><code>function find_repeating_sequences3($s) { $res = array(); $p = 0; $len = strlen($s); while ($p != $len) { $pat = $s[$p]; $i = ++$p; while ($i != $len) { if (!isset($res[$pat])) { if (false === strpos($s, $pat, $i)) { break; } $res[$pat] = 1; } // expand pattern and try again $pat .= $s[$i++]; } } return array_keys($res); } </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload