Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to correct the user input (Kind of google "did you mean?")
    primarykey
    data
    text
    <p>I have the following requirement: -</p> <p>I have many (say 1 million) values (names). The user will type a search string.</p> <p>I don't expect the user to spell the names correctly.</p> <p>So, I want to make kind of Google "Did you mean". This will list all the possible values from my datastore. There is a similar but not same question <a href="https://stackoverflow.com/questions/135777/a-stringtoken-parser-which-gives-google-search-style-did-you-mean-suggestions">here</a>. This did not answer my question.</p> <p>My question: - 1) I think it is not advisable to store those data in RDBMS. Because then I won't have filter on the SQL queries. And I have to do full table scan. So, in <strong>this situation how the data should be stored?</strong></p> <p>2) The second question is the same as <a href="https://stackoverflow.com/questions/135777/a-stringtoken-parser-which-gives-google-search-style-did-you-mean-suggestions">this</a>. But, just for the completeness of my question: how do I search through the large data set? Suppose, there is a name Franky in the dataset. If a user types as Phranky, how do I match the Franky? Do I have to loop through all the names?</p> <p>I came across <a href="http://en.wikipedia.org/wiki/Levenshtein_distance" rel="nofollow noreferrer">Levenshtein Distance</a>, which will be a good technique to find the possible strings. But again, my question is do I have to operate on all 1 million values from my data store?</p> <p>3) I know, Google does it by watching users behavior. But I want to do it without watching user behavior, i.e. by using, I don't know yet, say distance algorithms. Because the former method will require large volume of searches to start with!</p> <p>4) As <a href="https://stackoverflow.com/users/146077/kirk-broadhurst">Kirk Broadhurst</a> pointed out in an answer <a href="https://stackoverflow.com/questions/1284782/how-to-correct-the-user-input-kind-of-google-did-you-mean/1360275#1360275">below</a>, there are two possible scenarios: -</p> <ul> <li>Users mistyping a word (an edit distance algorithm)</li> <li>Users not knowing a word and guessing (a phonetic match algorithm)</li> </ul> <p>I am interested in both of these. They are really two separate things; e.g. Sean and Shawn sound the same but have an edit distance of 3 - too high to be considered a typo.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload