Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>My 2 cents. Given the fact that translate.google.com is a statistical machine translation engine and "The Unreasonable Effectiveness of Data" from A Halevy, P Norvig (Director of Research at Google) &amp; F Pereira: <em>I make the assumption</em> (bet) that this is a <strong>statistically driven spell checker</strong>. </p> <p>How it could work: you collect a very large corpus of the language you want to spell check. You store this corpus as phrase-tables in adapted datastructures (<a href="http://en.wikipedia.org/wiki/Suffix_array" rel="noreferrer">suffix arrays</a> for example if you have to count the <a href="http://en.wikipedia.org/wiki/N-gram" rel="noreferrer">n-grams</a> subsets) that keep track of the count (an so an estimated probability of) the number of n-grams.</p> <p>For example, if your corpus is only constitued of:</p> <pre><code>I had bean soup last diner. </code></pre> <p>From this entry, you will generate the following bi-grams (sets of 2 words): </p> <pre><code>I had, had bean, bean soup, soup last, last diner </code></pre> <p>and the tri-grams (sets of 3 words):</p> <pre><code>I had bean, had bean soup, bean soup last, soup last diner </code></pre> <p>But they will be pruned by tests of statistical relevance, for example: we can assume that the tri-gram</p> <pre><code>I had bean </code></pre> <p>will disappear of the phrase-table.</p> <p>Now, spell checking is only going to look is this big phrase-tables and check the "probabilities". (You need a good infrastructure to store this phrase-tables in an efficient data structure and in RAM, Google has it for translate.google.com, why not for that ? It's easier than statistical machine translation.)</p> <p>Ex: you type</p> <pre><code>I had been soup </code></pre> <p>and in the phrase-table there is a </p> <pre><code>had bean soup </code></pre> <p>tri-gram with a much higher probability than what you just typed! Indeed, you only need to change one word (this is a "not so distant" tri-gram) to have a tri-gram with a much higher probability. There should be an evaluating function dealing with the trade-off distance/probability. This distance could even be calculated in terms of characters: we are doing spell checking, not machine translation.</p> <p>This is only my hypothetical opinion. ;)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload