Note that there are some explanatory texts on larger screens.

plurals
  1. POfindAssocs for multiple terms in R
    text
    copied!<p>In R I used the <code>[tm package][1]</code> for building a term-document matrix from a corpus of documents. </p> <p>My goal is to extract word-associations from <strong>all</strong> bigrams in the term document matrix and return for each the top three or some. Therefore I'm looking for a variable that holds all row.names from the matrix so the function <code>findAssocs()</code> can do his job.</p> <p>This is my code so far:</p> <pre><code>library(tm) library(RWeka) txtData &lt;- read.csv("file.csv", header = T, sep = ",") txtCorpus &lt;- Corpus(VectorSource(txtData$text)) ...further preprocessing #Tokenizer for n-grams and passed on to the term-document matrix constructor BigramTokenizer &lt;- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2)) txtTdmBi &lt;- TermDocumentMatrix(txtCorpus, control = list(tokenize = BigramTokenizer)) #term argument holds two words since the BigramTokenizer extracted all pairs from txtCorpus findAssocs(txtTdmBi, "cat shop", 0.5) cat cabi cat scratch ... 0.96 0.91 </code></pre> <p>I tried to define a variable with all the row.names from <code>txtTdmBi</code> and feed it to the <code>findAssocs()</code> function. However, with the following result:</p> <pre><code>allRows &lt;- c(row.names(txtTdmBi)) findAssocs(txtTdmBi, allRows, 0.5) Error in which(x[term, ] &gt; corlimit) : subscript out of bounds In addition: Warning message: In term == Terms(x) : longer object length is not a multiple of shorter object length </code></pre> <p>Because extracting associations for a term spent over multiple term-document matrices is already explained <a href="https://stackoverflow.com/questions/16695866/r-finding-the-top-10-terms-associated-with-the-term-fraud-across-documents-i/16696053#16696053">here</a>, I guess it would be possible to find the associations for multiple terms in a single term-document matrix. Except how? </p> <p>I hope someone can clarify me how to solve this. Thanks in advance for any support.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload