StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Form what you are doing, I suspect that the following would suit you near perfectly:</p> <pre><code>from collections import defaultdict text = ( "cat", "dog", "rat", "bat", "rat", "dog", "man", "woman", "child", "child") # d = defaultdict(list) for lineno, word in enumerate(text): d[word].append(lineno) print d </code></pre> <p>This gives you an output of:</p> <pre><code>defaultdict(<type 'list'>, {'bat': [3], 'woman': [7], 'dog': [1, 5], 'cat': [0], 'rat': [2, 4], 'child': [8, 9], 'man': [6]}) </code></pre> <p>This simply sets up an empty default dictionary containing a list for each item you access, so that you don't need to worry about creating the entry, and then enumerates it's way over the list of words, so you don't need to keep track of the line number.</p> <p>As you don't have a list of correct spellings, this doesn't actually check if the words are correctly spelled, just builds a dictionary of <strong>all</strong> the words in the text file.</p> <p>To convert the dictionary to a set of words, try:</p> <pre><code>all_words = set(d.keys()) print all_words </code></pre> <p>Which produces:</p> <pre><code>set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man']) </code></pre> <p>Or, just to print the words:</p> <pre><code>for word in d.keys(): print word </code></pre> <p><strong>Edit 3:</strong></p> <p>I think this might be the final version: It's a (deliberately) very crude, but almost complete spell checker.</p> <pre><code>from collections import defaultdict # Build a set of all the words we know, assuming they're one word per line good_words = set() # Use a set, as this will have the fastest look-up time. with open("words.txt", "rt") as f: for word in f.readlines(): good_words.add(word.strip()) bad_words = defaultdict(list) with open("text_to_check.txt", "rt") as f: # For every line of text, get the line number, and the text. for line_no, line in enumerate(f): # Split into seperate words - note there is an issue with punctuation, # case sensitivitey, etc.. for word in line.split(): # If the word is not recognised, record the line where it occurred. if word not in good_words: bad_words[word].append(line_no) </code></pre> <p>At the end, <code>bad_words</code> will be a dictionary with the unrecognised words as the key, and the line numbers where the words were as the matching value entry.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload