Note that there are some explanatory texts on larger screens.

plurals
  1. PORetrieving total number of words with 2 or more letters in a document using python
    primarykey
    data
    text
    <p>I have a small Python script that calculates the top 10 most frequent words, 10 most infrequent words and the total number of words in a .txt document. According to the assignment, a word is defined as 2 letters or more. I have the 10 most frequent and the 10 most infrequent words printing fine, however when I attempt to print the total number of words in the document it prints the total number of all the words, including the single letter words (such as "a"). How can I get the total number of words to calculate ONLY the words that have 2 letters or more?</p> <p>Here is my script:</p> <pre><code>from string import * from collections import defaultdict from operator import itemgetter import re number = 10 words = {} total_words = 0 words_only = re.compile(r'^[a-z]{2,}$') counter = defaultdict(int) """Define function to count the total number of words""" def count_words(s): unique_words = split(s) return len(unique_words) """Define words as 2 letters or more -- no single letter words such as "a" """ for word in words: if len(word) &gt;= 2: counter[word] += 1 """Open text document, strip it, then filter it""" txt_file = open('charactermask.txt', 'r') for line in txt_file: total_words = total_words + count_words(line) for word in line.strip().split(): word = word.strip(punctuation).lower() if words_only.match(word): counter[word] += 1 # Most Frequent Words top_words = sorted(counter.iteritems(), key=lambda(word, count): (-count, word))[:number] print "Most Frequent Words: " for word, frequency in top_words: print "%s: %d" % (word, frequency) # Least Frequent Words: least_words = sorted(counter.iteritems(), key=lambda (word, count): (count, word))[:number] print " " print "Least Frequent Words: " for word, frequency in least_words: print "%s: %d" % (word, frequency) # Total Unique Words: print " " print "Total Number of Words: %s" % total_words </code></pre> <p>I am not an expert with Python, this is for a Python class I am currently taking. The neatness of my code and proper formatting count against me in this assignment, if possible can someone also tell me if the format of this code is considered "good practice"?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload