Note that there are some explanatory texts on larger screens.

plurals
  1. POCalculating vowel to word length ratio in a list of words
    primarykey
    data
    text
    <p>Here is the code for my function:</p> <pre><code>def calcVowelProportion(wordList): """ Calculates the proportion of vowels in each word in wordList. """ VOWELS = 'aeiou' ratios = [] for word in wordList: numVowels = 0 for char in word: if char in VOWELS: numVowels += 1 ratios.append(numVowels/float(len(word))) </code></pre> <p>Right now, I'm working with a list of over 87,000 words and this algorithm is obviously extremely slow.</p> <p>Is there a better way to do this?</p> <p>EDIT:</p> <p>I tested the algorithms @ExP provided with the following class:</p> <pre><code> import time class vowelProportions(object): """ A series of methods that all calculate the vowel/word length ratio in a list of words. """ WORDLIST_FILENAME = "words_short.txt" def __init__(self): self.wordList = self.buildWordList() print "Original: " + str(self.calcMeanTime(10000, self.cvpOriginal, self.wordList)) print "Generator: " + str(self.calcMeanTime(10000, self.cvpGenerator, self.wordList)) print "Count: " + str(self.calcMeanTime(10000, self.cvpCount, self.wordList)) print "Translate: " + str(self.calcMeanTime(10000, self.cvpTranslate, self.wordList)) def buildWordList(self): inFile = open(self.WORDLIST_FILENAME, 'r', 0) wordList = [] for line in inFile: wordList.append(line.strip().lower()) return wordList def cvpOriginal(self, wordList): """ My original, slow algorithm""" VOWELS = 'aeiou' ratios = [] for word in wordList: numVowels = 0 for char in word: if char in VOWELS: numVowels += 1 ratios.append(numVowels/float(len(word))) return ratios def cvpGenerator(self, wordList): """ Using a generator expression """ return [sum(char in 'aeiou' for char in word)/float(len(word)) for word in wordList] def cvpCount(self, wordList): """ Using str.count() """ return [sum(word.count(char) for char in 'aeiou')/float(len(word)) for word in wordList] def cvpTranslate(self, wordList): """ Using str.translate() """ return [len(word.translate(None, 'bcdfghjklmnpqrstxyz'))/float(len(word)) for word in wordList] def timeFunc(self, func, *args): start = time.clock() func(*args) return time.clock() - start def calcMeanTime(self, numTrials, func, *args): times = [self.timeFunc(func, *args) for x in range(numTrials)] return sum(times)/len(times) </code></pre> <p>The output was (for a list of 200 words):</p> <pre><code>Original: 0.0005613667 Generator: 0.0008402738 Count: 0.0012531976 Translate: 0.0003343548 </code></pre> <p>Surprisingly, Generator and Count were even slower than the original (please let me know if my implementation was incorrect).</p> <p>I would like to test @John's solution, but don't know anything about trees.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload