StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Don't expect my idea here to be perfect or optimal, but it might be a good starting point for you if you decide to go this route. A genetic algorithm may not be the best choice for a spell checker though.</p> <p>For a genetic algorithm, you need to have a starting population, a way to pass the genes to the "next generation" (crossover), a definite means of creating mutations, and a way of selecting which ones are passed on to the next generation (aka a fitness function). Along with this you'll need, of course, a corpus. You can try the dictionary.com API if it's any good (I've never used it) <a href="http://www.programmableweb.com/api/dictionary.com" rel="nofollow">http://www.programmableweb.com/api/dictionary.com</a>.</p> <p>For the starting population, you have the horrible issue in that your starting population will be thousands of the exact same word (i.e. ['hello']*1000). From here you can just check if it's a word, then if it is just return True (because grammar checking there vs their vs they're will be a pain in the ass).</p> <p>To start off, you'll need to rely entirely on mutations to gain diversity, so maybe make mutations more likely if it's an earlier generation, and once the diversity grows the chance of mutation decreases. Mutations can be any of: insert a random letter somewhere, remove a letter somewhere, change a letter somewhere, do more than one of these.</p> <p>For your fitness function, your best bet will be to use a sequence alignment algorithm. See: <a href="http://en.wikipedia.org/wiki/Sequence_alignment" rel="nofollow">http://en.wikipedia.org/wiki/Sequence_alignment</a>. If you REALLY want to get advanced, try creating phonetic spellings for each word in your population and see if they match anything in the corpus, and increase score based on that (i.e. tho and though would have the same pronunciation). I cannot claim to know anything about that. Bare in mind all of this will slow down your application horribly, so keep that in mind. It might be best to limit your population to 1000-2000.</p> <p>For your crossover, you should take a few of your samples (early on you may need to use roulette to pick which will be the most fit, but later on you can use tournament for speed purposes). Again you can use the sequence alignment between each "parent", and then decide which letter to pull from each parent (i.e. soeed vs s_eeo can come out to be soeed, seed, seeo, or soeeo).</p> <p>Don't take this as an expert solution, plus I only put a few minutes of thought into this, but it could be a good start if you decide to use a genetic algorithm.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload