Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to find all substrings of a String with start and end indices
    text
    copied!<p>I've recently written some Scala code which processes a String, finding all its sub-strings and retaining a list of those which are found in a dictionary. The start and end of the sub-strings within the overall string also have to be retained for later use, so the easiest way to do this seemed to be just to use nested for loops, something like this:</p> <pre><code>for (i &lt;- 0 until word.length) for (j &lt;- i until word.length) { val sub = word.substring(i, j + 1) // lookup sub in dictionary here and add new match if found } </code></pre> <p>As an exercise, I decided to have a go at doing the same thing in Haskell. It seems straightforward enough without the need for the sub-string indices - I can use something like <a href="https://stackoverflow.com/a/5377754/241990">this approach</a> to get the sub-strings, then call a recursive function to accumulate the matches. But if I want the indices too it seems trickier.</p> <p>How would I write a function which returns a list containing each continuous sub-string along with its start and end index within the "parent" string?</p> <p>For example <code>tokens "blah"</code> would give <code>[("b",0,0), ("bl",0,1), ("bla",0,2), ...]</code></p> <h3>Update</h3> <p>A great selection of answers and plenty of new things to explore. After messing about a bit, I've gone for the first answer, with Daniel's suggestion to allow the use of <code>[0..]</code>.</p> <pre><code>data Token = Token String Int Int continuousSubSeqs = filter (not . null) . concatMap tails . inits tokenize xs = map (\(s, l) -&gt; Token s (head l) (last l)) $ zip s ind where s = continuousSubSeqs xs ind = continuousSubSeqs [0..] </code></pre> <p>This seemed relatively easy to understand, given my limited Haskell knowledge. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload