Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Search engines whether file based (Lucene, Sphinx, etc) or database (Oracle Text, MSSQL Fulltext) solve this problem with a thesaurus. That is, they collect words together on the basis of them being synonyms. The qualification for being a synonym is tighter than in Roget's book, but the principle is the same. Synonyms bundle up abbreviations, acronyms and common misspellings. So for instance, a search thesaurus might identify <em>street</em> and <em>st</em> as being the same thing. Although, context is everything: in the string "St Pancras Road" <em>st</em> is a synonym for <em>saint</em>.</p> <p>So, does this help you at all? Up to a point. It suggests the sort of thing you want to implement:</p> <pre><code>string | canonical ------------+---------- street | st | street strete | street Chile | chilly | Chile chili | Chile </code></pre> <p>The unfortunate thing is that building and maintaining a thesaurus requires human ingenuity and effort. Building a taxonomy requires expertise; tracking new additions requires time. The other thing is that even with a thesaurus the matches remain probabalistic: <em>MoMA</em> might be the same as <em>Museum of Modern Art</em> but is it the same as <em>SFMOMA</em> or <em>NYMOMA</em>? Not exactly but maybe 90% the same?</p> <p>An alternative approach would be to do what SO does with tags. When you tagged your question a dropdown box appeared, suggesting available tags. As you typed more letters the list narrowed. This is isn't fool foolproof, witness the presence of tags like <code>tsql</code> and <code>t-sql</code> but it is pretty good. SO also has a backup, which is to provide the power users with a list of freshly minted tags so they can investigate these coinages and perhaps quash them. But that still remnains a manual process.</p> <p>Alas there is no alogorithm that is going to be able to tell that <em>MoMA</em> is the same as <em>Museum of Modern Art</em>, let alone figure out whether it references the institution in New York or San Francisco. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload