Note that there are some explanatory texts on larger screens.

plurals
  1. POLooking for ideas: lexicographically sorted suffix array of many different strings compute efficiently an LCP array
    primarykey
    data
    text
    <p>I don't want a direct solution to the problem that's the source of this question but it's this one <a href="https://www.interviewstreet.com/challenges/dashboard/#problem/4efa210eb70ac" rel="nofollow">link</a>:</p> <p>So I take in the strings and add them to a suffix array which is implemented as a sorted set internally, what I obtain then is a lexicographically sorted list of the two given strings.</p> <pre><code>S1 = "banana" S2 = "panama" SuffixArray.add S1, S2 </code></pre> <p>To make searching for the <code>k-th</code> smallest substring efficient I preprocess this sorted set to add in information about the longest common prefix between a suffix and it's predecessor as well as keeping tabs on a cumulative substrings count. So I know that for a given <code>k</code> greater than the cumulative substrings count of the last item, it's an invalid query.</p> <p>This works really well for small inputs as well as random large inputs of the constraints given in the problem definition, which is at most 50 strings of length 2000. I am able to pass the 4 out of 7 cases and was pretty surprised I didn't get them all.</p> <p>So I went searching for the bottleneck and it hit me. Given large number of inputs like these</p> <pre><code>anananananananana.....ananana bkbkbkbkbkbkbkbkb.....bkbkbkb </code></pre> <p>The queries for k-th smallest substrings are still fast as expected <strong>but not the way I preprocess the sorted set</strong>... The way I calculate the longest common prefix between the elements of the set is not efficient and linear O(m), like this, I did the most naïve thing expecting it to be good enough:</p> <pre><code>m = anananan n = anananana Start at 0 and find the point where `m[i] != n[i]` </code></pre> <p>It is like this because a suffix and his predecessor might no be related (i.e. coming from different input strings) and so I thought I couldn't help but using brute force.</p> <p>Here is the question then and where I ended up reducing the problem as. Given a list of lexicographically sorted suffix like in the manner I described above (made up of multiple strings):</p> <p><strong>What is an efficient way of computing the longest common prefix array?</strong>. </p> <p><strong>The subquestion would then be, am I completely off the mark in my approach? Please propose further avenues of investigation if that's the case.</strong></p> <p><em>Foot note, I do not want to be shown implemented algorithm and I don't mind to be told to go read so and so book or resource on the subject as that is what I do anyway while attempting these challenges.</em></p> <p><em>Accepted answer will be something that guides me on the right path or in the case that that fails; something that teaches me how to solve these types of problem in a broader sense, a book or something</em></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload