Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Another thing that may slow up performance - deleting from dictionary. Re-building dictionary may be much more efficient:</p> <pre><code>word_dict = {key: count for key, count in takewhile(lambda key_count: itemgetter(1) &gt;= 5, main_dict.most_common()) </code></pre> <p>On the whole, I am a bit lazy to get into all the details, but I using a little bit references may be more efficient. As far as I see, you don't need *doc_count* variable - it's redundant and inefficient, and re-evaluating it reduces your performance too. *main_dict.keys()* does the same - give you list of all words once.</p> <p>This is a sketch of what I have in mind - I cannot prove that it's more efficient, but it certainly looks more pythonic</p> <pre><code>with open(inp,'r') as plot_data: word_dict = Counter() file1, file2 = itertools.tee(plot_data, 2) line_one = itertools.islice(file1, 0, None, 4) line_two = itetools.islice(file2, 2, None, 4) all_stop_words = stopwords.words('english') movie_dict = defaultdict(Counter) stemmer_func = PorterStemmer().stem_word for movie_name, movie_plot in itertools.izip(line_one, line_two): movie_plot = movie_plot.lower() words = &lt;see above - I am updating original post&gt; all_words = [stemmer_func(word) for word in words if not word in all_stop_words] current_word_counter = Counter(all_words) movie_dict[movie_name].update(current_word_counter) word_dict.update(current_word_counter) </code></pre> <p>The last - <em>dictionary</em> is not a good variable name, it does not tell you what it contains</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload