Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>I tried to implement the Suffix Tree with the approach given in jogojapan's answer, but it didn't work for some cases due to wording used for the rules. Moreover, I've mentioned that nobody managed to implement an absolutely correct suffix tree using this approach. Below I will write an "overview" of jogojapan's answer with some modifications to the rules. I will also describe the case when we forget to create <strong><em>important</em></strong> suffix links.</p> <p><strong>Additional variables used</strong></p> <ol> <li><strong>active point</strong> - a triple (active_node; active_edge; active_length), showing from where we must start inserting a new suffix.</li> <li><strong>remainder</strong> - shows the number of suffixes we must add <em>explicitly</em>. For instance, if our word is 'abcaabca', and remainder = 3, it means we must process 3 last suffixes: <strong>bca</strong>, <strong>ca</strong> and <strong>a</strong>.</li> </ol> <p>Let's use a concept of an <strong>internal node</strong> - all the nodes, except the <em>root</em> and the <em>leafs</em> are <strong>internal nodes</strong>.</p> <p><strong>Observation 1</strong></p> <p>When the final suffix we need to insert is found to exist in the tree already, the tree itself is not changed at all (we only update the <code>active point</code> and <code>remainder</code>).</p> <p><strong>Observation 2</strong></p> <p>If at some point <code>active_length</code> is greater or equal to the length of current edge (<code>edge_length</code>), we move our <code>active point</code> down until <code>edge_length</code> is strictly greater than <code>active_length</code>.</p> <p>Now, let's redefine the rules:</p> <p><strong>Rule 1</strong></p> <blockquote> <p>If after an insertion from the <em>active node</em> = <em>root</em>, the <em>active length</em> is greater than 0, then:</p> <ol> <li><em>active node</em> is not changed</li> <li><em>active length</em> is decremented</li> <li><em>active edge</em> is shifted right (to the first character of the next suffix we must insert)</li> </ol> </blockquote> <p><strong>Rule 2</strong></p> <blockquote> <p>If we create a new <em>internal node</em> <strong>OR</strong> make an inserter from an <em>internal node</em>, and this is not the first <strong>SUCH</strong> <em>internal node</em> at current step, then we link the previous <strong>SUCH</strong> node with <strong>THIS</strong> one through a <em>suffix link</em>.</p> </blockquote> <p>This definition of the <code>Rule 2</code> is different from jogojapan', as here we take into account not only the <em>newly created</em> internal nodes, but also the internal nodes, from which we make an insertion.</p> <p><strong>Rule 3</strong></p> <blockquote> <p>After an insert from the <em>active node</em> which is not the <em>root</em> node, we must follow the suffix link and set the <em>active node</em> to the node it points to. If there is no a suffix link, set the <em>active node</em> to the <em>root</em> node. Either way, <em>active edge</em> and <em>active length</em> stay unchanged.</p> </blockquote> <p>In this definition of <code>Rule 3</code> we also consider the inserts of leaf nodes (not only split-nodes).</p> <p><strong>And finally, Observation 3:</strong></p> <p>When the symbol we want to add to the tree is already on the edge, we, according to <code>Observation 1</code>, update only <code>active point</code> and <code>remainder</code>, leaving the tree unchanged. <strong>BUT</strong> if there is an <em>internal node</em> marked as <em>needing suffix link</em>, we must connect that node with our current <code>active node</code> through a suffix link.</p> <p>Let's look at the example of a suffix tree for <strong>cdddcdc</strong> if we add a suffix link in such case and if we don't:</p> <ol> <li><p>If we <strong>DON'T</strong> connect the nodes through a suffix link:</p> <ul> <li>before adding the last letter <strong>c</strong>:</li> </ul> <p><img src="https://i.stack.imgur.com/zPiF1.png" alt=""></p> <ul> <li>after adding the last letter <strong>c</strong>:</li> </ul> <p><img src="https://i.stack.imgur.com/5fsmd.png" alt=""></p></li> <li><p>If we <strong>DO</strong> connect the nodes through a suffix link:</p> <ul> <li>before adding the last letter <strong>c</strong>:</li> </ul> <p><img src="https://i.stack.imgur.com/lZrAF.png" alt=""></p> <ul> <li>after adding the last letter <strong>c</strong>:</li> </ul> <p><img src="https://i.stack.imgur.com/XUFjk.png" alt=""></p></li> </ol> <p>Seems like there is no significant difference: in the second case there are two more suffix links. But these suffix links are <em>correct</em>, and one of them - from the blue node to the red one - is very <strong>important</strong> for our approach with <strong>active point</strong>. The problem is that if we don't put a suffix link here, later, when we add some new letters to the tree, we might omit adding some nodes to the tree due to the <code>Rule 3</code>, because, according to it, if there's no a suffix link, then we must put the <code>active_node</code> to the root.</p> <p>When we were adding the last letter to the tree, the red node had <strong>already existed</strong> before we made an insert from the blue node (the edge labled <strong>'c'</strong>). As there was an insert from the blue node, we mark it as <em>needing a suffix link</em>. Then, relying on the <strong>active point</strong> approach, the <code>active node</code> was set to the red node. But we don't make an insert from the red node, as the letter <strong>'c'</strong> is already on the edge. Does it mean that the blue node must be left without a suffix link? No, we must connect the blue node with the red one through a suffix link. Why is it correct? Because the <strong>active point</strong> approach guarantees that we get to a right place, i.e., to the next place where we must process an insert of a <strong>shorter</strong> suffix.</p> <p>Finally, here are my implementations of the Suffix Tree:</p> <ol> <li><a href="https://gist.github.com/makagonov/22ab3675e3fc0031314e8535ffcbee2c" rel="noreferrer">Java</a></li> <li><a href="https://gist.github.com/makagonov/f7ed8ce729da72621b321f0ab547debb" rel="noreferrer">C++</a></li> </ol> <p>Hope that this "overview" combined with jogojapan's detailed answer will help somebody to implement his own Suffix Tree.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload