Note that there are some explanatory texts on larger screens.

plurals
  1. POCounting every word in a text file only once using python
    primarykey
    data
    text
    <p>I have a small python script I am working on for a class homework assignment. The script reads a file and prints the 10 most frequent and infrequent words and their frequencies. For this assignment, a word is defined as 2 letters or more. I have the word frequencies working just fine, however the third part of the assignment is to print the total number of <strong>unique</strong> words in the document. Unique words meaning count every word in the document, <strong>only once.</strong></p> <p>Without changing my current script too much, how can I count all the words in the document only one time?</p> <p><strong>p.s. I am using Python 2.6 so please don't mention the use of collections.Counter</strong></p> <pre><code>from string import punctuation from collections import defaultdict import re number = 10 words = {} total_unique = 0 words_only = re.compile(r'^[a-z]{2,}$') counter = defaultdict(int) """Define words as 2+ letters""" def count_unique(s): count = 0 if word in line: if len(word) &gt;= 2: count += 1 return count """Open text document, read it, strip it, then filter it""" txt_file = open('charactermask.txt', 'r') for line in txt_file: for word in line.strip().split(): word = word.strip(punctuation).lower() if words_only.match(word): counter[word] += 1 # Most Frequent Words top_words = sorted(counter.iteritems(), key=lambda(word, count): (-count, word))[:number] print "Most Frequent Words: " for word, frequency in top_words: print "%s: %d" % (word, frequency) # Least Frequent Words: least_words = sorted(counter.iteritems(), key=lambda (word, count): (count, word))[:number] print " " print "Least Frequent Words: " for word, frequency in least_words: print "%s: %d" % (word, frequency) # Total Unique Words: print " " print "Total Number of Unique Words: %s " % total_unique </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload