StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POLooking for ideas: lexicographically sorted suffix array of many different strings compute efficiently an LCP array
primarykey
Id
14282424
data
AcceptedAnswerId
14285801
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-01-11T16:28:16.803
FavoriteCount
2
LastActivityDate
2013-01-11T20:02:15.170
LastEditDate
2013-01-11T19:07:46.413
LastEditorUserId
429564
OwnerUserId
429564
ParentId
0
PostTypeId
1
Score
3
ViewCount
2366
LastEditorDisplayName
text
Body
I don't want a direct solution to the problem that's the source of this question but it's this one <a href="https://www.interviewstreet.com/challenges/dashboard/#problem/4efa210eb70ac" rel="nofollow">link</a>: So I take in the strings and add them to a suffix array which is implemented as a sorted set internally, what I obtain then is a lexicographically sorted list of the two given strings. <pre><code>S1 = "banana" S2 = "panama" SuffixArray.add S1, S2 </code></pre> To make searching for the <code>k-th</code> smallest substring efficient I preprocess this sorted set to add in information about the longest common prefix between a suffix and it's predecessor as well as keeping tabs on a cumulative substrings count. So I know that for a given <code>k</code> greater than the cumulative substrings count of the last item, it's an invalid query. This works really well for small inputs as well as random large inputs of the constraints given in the problem definition, which is at most 50 strings of length 2000. I am able to pass the 4 out of 7 cases and was pretty surprised I didn't get them all. So I went searching for the bottleneck and it hit me. Given large number of inputs like these <pre><code>anananananananana.....ananana bkbkbkbkbkbkbkbkb.....bkbkbkb </code></pre> The queries for k-th smallest substrings are still fast as expected but not the way I preprocess the sorted set... The way I calculate the longest common prefix between the elements of the set is not efficient and linear O(m), like this, I did the most naïve thing expecting it to be good enough: <pre><code>m = anananan n = anananana Start at 0 and find the point where `m[i] != n[i]` </code></pre> It is like this because a suffix and his predecessor might no be related (i.e. coming from different input strings) and so I thought I couldn't help but using brute force. Here is the question then and where I ended up reducing the problem as. Given a list of lexicographically sorted suffix like in the manner I described above (made up of multiple strings): What is an efficient way of computing the longest common prefix array?. The subquestion would then be, am I completely off the mark in my approach? Please propose further avenues of investigation if that's the case. Foot note, I do not want to be shown implemented algorithm and I don't mind to be told to go read so and so book or resource on the subject as that is what I do anyway while attempting these challenges. Accepted answer will be something that guides me on the right path or in the case that that fails; something that teaches me how to solve these types of problem in a broader sense, a book or something
Tags
<string><algorithm><similarity><suffix-array>
Title
Looking for ideas: lexicographically sorted suffix array of many different strings compute efficiently an LCP array
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USHugo
UserOwnerUserId
1. USHugo
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POLooking for ideas: lexicographically sorted suffix array of many different strings compute efficiently an LCP array
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POLooking for ideas: lexicographically sorted suffix array of many different strings compute efficiently an LCP array
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POLooking for ideas: lexicographically sorted suffix array of many different strings compute efficiently an LCP array
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.