Note that there are some explanatory texts on larger screens.

plurals
  1. POApplying the collections.counter() on UTF-8
    primarykey
    data
    text
    <p>I have a list consisting of non-English text in utf-8 format. Therefore, when I attempt to print a single word, it gives me this: u'\u0648\u0627\u0644\u0623\u0631\u0646\u0628'</p> <p>Therefore, in order to print it as the original word, I have to loop it and it will output correctly, as the original word.</p> <p>I want to find the 5 most frequent words. When storing the words into the collections.counter() function, they enter as the unicode format. How do I access the word inside the counter() in order to print the top 5 most frequent words.</p> <p>I have done the following code: (txt being my text file)</p> <blockquote> <blockquote> <blockquote> <p>words = [w for w in txt.split()]</p> </blockquote> </blockquote> </blockquote> <p>will print out</p> <blockquote> <blockquote> <blockquote> <p>[u'\ufeff\u0643\u0627\u0646', u'\u064a\u0627', u'\u0645\u0627', ...u'\u0643\u0627\u0646', u'\u0641\u064a', u'\u0642\u062f\u064a\u0645']</p> </blockquote> </blockquote> </blockquote> <p>I therefore loop it to get the desired output (I don't know why)</p> <pre><code> &gt;&gt;&gt; for w in words: print w, </code></pre> <p>will print out</p> <blockquote> <blockquote> <blockquote> <p>كان يا ما كان</p> </blockquote> </blockquote> </blockquote> <p>I use the counter() function to find the most frequent words</p> <pre><code> &gt;&gt;&gt; count = collections.Counter(words) &gt;&gt;&gt; print count.most_common(5) </code></pre> <p>will print out</p> <pre><code> &gt;&gt;&gt; [(u'\u0627\u0644\u0633\u0644\u062d\u0641\u0627\u0629', 5), (u'\u0627\u0644\u0645\u063a\u0631\u0648\u0631', 3), (u'\u0627\u0644\u0623\u0631\u0646\u0628', 2), (u'\u060c', 2), (u'\u0648\u0627\u0644\u0623\u0631\u0646\u0628', 2)] </code></pre> <p>I want to access each word and loop it to print it out WITH it's frequency.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload