Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <h3>1. Introduction</h3> <p>Here's a way to approach this problem systematically: if you have an algorithm that plays hangman well, then you can take the difficulty of each word to be the number of wrong guesses that your program would take if guessing that word.</p> <h3>2. Aside on hangman strategy</h3> <p>There's an idea that's implicit in some the other answers and comments, that the optimal strategy for the solver would be to base their decisions on the frequency of letters in English, or on the frequency of words in some corpus. This is a seductive idea, but it's not quite right. The solver does best if it <em>accurately models the distribution of words chosen by the setter</em>, and a human setter may well be choosing words based on their rarity or avoidance of frequently used letters. For example, although <code>E</code> is the most frequently used letter in English, if the setter always chooses from the words <code>JUGFUL</code>, <code>RHYTHM</code>, <code>SYZYGY</code>, and <code>ZYTHUM</code>, then a perfect solver does not start by guessing <code>E</code>!</p> <p>The best approach to modelling the setter depends on the context, but I guess that some kind of Bayesian inductive inference would work well in a context where the solver plays many games against the same setter, or against a group of similar setters.</p> <h3>3. A hangman algorithm</h3> <p>Here I'll outline a solver that is pretty good (but far from perfect). It models the setter as choosing words uniformly from a fixed dictionary. It's a <a href="http://en.wikipedia.org/wiki/Greedy_algorithm" rel="nofollow noreferrer">greedy algorithm</a>: at each stage it guesses the letter that minimizes the number of misses, that is, words that do not contain the guess. For example, if no guesses have been made so far, and the possible words are <code>DEED</code>, <code>DEAD</code> and <code>DARE</code>, then:</p> <ul> <li>if you guess <code>D</code> or <code>E</code>, there are no misses;</li> <li>if you guess <code>A</code>, there's one miss (<code>DEED</code>);</li> <li>if you guess <code>R</code>, there are two misses (<code>DEED</code> and <code>DEAD</code>);</li> <li>if you guess any other letter, there are three misses.</li> </ul> <p>So either <code>D</code> or <code>E</code> is a good guess in this situation.</p> <p>(Thanks to <a href="https://stackoverflow.com/questions/16223305/algorithm-for-classifying-words-for-hangman-difficulty-levels-as-easy-medium/16225534#comment23208206_16225534">Colonel Panic in comments</a> for pointing out that correct guesses are free in hangman—I totally forgot this in my first attempt!)</p> <h3>4. Implementation</h3> <p>Here's an implementation of this algorithm in Python:</p> <pre class="lang-py prettyprint-override"><code>from collections import defaultdict from string import ascii_lowercase def partition(guess, words): """Apply the single letter 'guess' to the sequence 'words' and return a dictionary mapping the pattern of occurrences of 'guess' in a word to the list of words with that pattern. &gt;&gt;&gt; words = 'deed even eyes mews peep star'.split() &gt;&gt;&gt; sorted(list(partition('e', words).items())) [(0, ['star']), (2, ['mews']), (5, ['even', 'eyes']), (6, ['deed', 'peep'])] """ result = defaultdict(list) for word in words: key = sum(1 &lt;&lt; i for i, letter in enumerate(word) if letter == guess) result[key].append(word) return result def guess_cost(guess, words): """Return the cost of a guess, namely the number of words that don't contain the guess. &gt;&gt;&gt; words = 'deed even eyes mews peep star'.split() &gt;&gt;&gt; guess_cost('e', words) 1 &gt;&gt;&gt; guess_cost('s', words) 3 """ return sum(guess not in word for word in words) def word_guesses(words, wrong = 0, letters = ''): """Given the collection 'words' that match all letters guessed so far, generate tuples (wrong, nguesses, word, guesses) where 'word' is the word that was guessed; 'guesses' is the sequence of letters guessed; 'wrong' is the number of these guesses that were wrong; 'nguesses' is len(guesses). &gt;&gt;&gt; words = 'deed even eyes heel mere peep star'.split() &gt;&gt;&gt; from pprint import pprint &gt;&gt;&gt; pprint(sorted(word_guesses(words))) [(0, 1, 'mere', 'e'), (0, 2, 'deed', 'ed'), (0, 2, 'even', 'en'), (1, 1, 'star', 'e'), (1, 2, 'eyes', 'en'), (1, 3, 'heel', 'edh'), (2, 3, 'peep', 'edh')] """ if len(words) == 1: yield wrong, len(letters), words[0], letters return best_guess = min((g for g in ascii_lowercase if g not in letters), key = lambda g:guess_cost(g, words)) best_partition = partition(best_guess, words) letters += best_guess for pattern, words in best_partition.items(): for guess in word_guesses(words, wrong + (pattern == 0), letters): yield guess </code></pre> <h3>5. Example results</h3> <p>Using this strategy it's possible to evaluate the difficulty of guessing each word in a collection. Here I consider the six-letter words in my system dictionary:</p> <pre class="lang-none prettyprint-override"><code>&gt;&gt;&gt; words = [w.strip() for w in open('/usr/share/dict/words') if w.lower() == w] &gt;&gt;&gt; six_letter_words = set(w for w in words if len(w) == 6) &gt;&gt;&gt; len(six_letter_words) 15066 &gt;&gt;&gt; results = sorted(word_guesses(six_letter_words)) </code></pre> <p>The easiest words to guess in this dictionary (together with the sequence of guesses needed for the solver to guess them) are as follows:</p> <pre class="lang-none prettyprint-override"><code>&gt;&gt;&gt; from pprint import pprint &gt;&gt;&gt; pprint(results[:10]) [(0, 1, 'eelery', 'e'), (0, 2, 'coneen', 'en'), (0, 2, 'earlet', 'er'), (0, 2, 'earner', 'er'), (0, 2, 'edgrew', 'er'), (0, 2, 'eerily', 'el'), (0, 2, 'egence', 'eg'), (0, 2, 'eleven', 'el'), (0, 2, 'enaena', 'en'), (0, 2, 'ennead', 'en')] </code></pre> <p>and the hardest words are these:</p> <pre class="lang-none prettyprint-override"><code>&gt;&gt;&gt; pprint(results[-10:]) [(12, 16, 'buzzer', 'eraoiutlnsmdbcfg'), (12, 16, 'cuffer', 'eraoiutlnsmdbpgc'), (12, 16, 'jugger', 'eraoiutlnsmdbpgh'), (12, 16, 'pugger', 'eraoiutlnsmdbpcf'), (12, 16, 'suddle', 'eaioulbrdcfghmnp'), (12, 16, 'yucker', 'eraoiutlnsmdbpgc'), (12, 16, 'zipper', 'eraoinltsdgcbpjk'), (12, 17, 'tuzzle', 'eaioulbrdcgszmnpt'), (13, 16, 'wuzzer', 'eraoiutlnsmdbpgc'), (13, 17, 'wuzzle', 'eaioulbrdcgszmnpt')] </code></pre> <p>The reason that these are hard is because after you've guessed <code>-UZZLE</code>, you still have seven possibilities left:</p> <pre class="lang-none prettyprint-override"><code>&gt;&gt;&gt; ' '.join(sorted(w for w in six_letter_words if w.endswith('uzzle'))) 'buzzle guzzle muzzle nuzzle puzzle tuzzle wuzzle' </code></pre> <h3>6. Choice of wordlist</h3> <p>Of course when preparing wordlists for your children you wouldn't start with your computer's system dictionary, you'd start with a list of words that you think they are likely to know. For example, you might have a look at <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#English" rel="nofollow noreferrer">Wiktionary's lists of the most frequently used words</a> in various English corpora.</p> <p>For example, among the 1,700 six-letter words in the <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/PG/2006/04/1-10000" rel="nofollow noreferrer">10,000 most common words in Project Gutenberg as of 2006</a>, the most difficult ten are these:</p> <pre class="lang-none prettyprint-override"><code>[(6, 10, 'losing', 'eaoignvwch'), (6, 10, 'monkey', 'erdstaoync'), (6, 10, 'pulled', 'erdaioupfh'), (6, 10, 'slaves', 'erdsacthkl'), (6, 10, 'supper', 'eriaoubsfm'), (6, 11, 'hunter', 'eriaoubshng'), (6, 11, 'nought', 'eaoiustghbf'), (6, 11, 'wounds', 'eaoiusdnhpr'), (6, 11, 'wright', 'eaoithglrbf'), (7, 10, 'soames', 'erdsacthkl')] </code></pre> <p>(Soames Forsyte is a character in the <a href="http://www.gutenberg.org/ebooks/4397" rel="nofollow noreferrer">Forsyte Saga by John Galsworthy</a>; the wordlist has been converted to lower-case so it wasn't possible for me to quickly remove proper names.)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload