Note that there are some explanatory texts on larger screens.

plurals
  1. POCounting bigrams (pair of two words) in a file using python
    primarykey
    data
    text
    <p>I want to count the number of occurrences of all bigrams (pair of adjacent words) in a file using python. Here, I am dealing with very large files, so I am looking for an efficient way. I tried using count method with regex "\w+\s\w+" on file contents, but it did not prove to be efficient.</p> <p>e.g. Let's say I want to count the number of bigrams from a file a.txt, which has following content:</p> <pre><code>"the quick person did not realize his speed and the quick person bumped " </code></pre> <p>For above file, the bigram set and their count will be :</p> <pre><code>(the,quick) = 2 (quick,person) = 2 (person,did) = 1 (did, not) = 1 (not, realize) = 1 (realize,his) = 1 (his,speed) = 1 (speed,and) = 1 (and,the) = 1 (person, bumped) = 1 </code></pre> <p>I have come across an example of Counter objects in Python, which is used to count unigrams (single words). It also uses regex approach.</p> <p>The example goes like this:</p> <pre><code>&gt;&gt;&gt; # Find the ten most common words in Hamlet &gt;&gt;&gt; import re &gt;&gt;&gt; from collections import Counter &gt;&gt;&gt; words = re.findall('\w+', open('a.txt').read()) &gt;&gt;&gt; print Counter(words) </code></pre> <p>The output of above code is :</p> <pre><code>[('the', 2), ('quick', 2), ('person', 2), ('did', 1), ('not', 1), ('realize', 1), ('his', 1), ('speed', 1), ('bumped', 1)] </code></pre> <p>I was wondering if it is possible to use the Counter object to get count of bigrams. Any approach other than Counter object or regex will also be appreciated.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload