Note that there are some explanatory texts on larger screens.

plurals
  1. POSimilarity Matrix from DocumentTermmatrix
    primarykey
    data
    text
    <p>I need to create a similarity matrix from a document term matrix in order to perform maximum capturing clusturing on documents. Have only found a solution for distance matrix so far. Tryed the dist method but it gives me the wrong output. Is there a way to create similarity matrixes using R? I used the tm package for the following code but i am not constricted to it, if there is any other good package, let me know. The code so far:</p> <pre><code>install.packages("tm") install.packages("rJava") install.packages("Snowball") install.packages("RWeka") install.packages("RWekajars") install.packages("XML") install.packages("openNLP") install.packages("openNLPmodels.en") Sys.setenv(NOAWT=TRUE) library(XML) library(rJava) library(Snowball) library(RWeka) library(tm) library(openNLP) library(openNLPmodels.en) sample = c( "cc ee aa", "dd bb ee", "bb cc ee dd", "cc ee dd aa", "bb ee", "cc dd aa", "bb cc aa", "bb cc", "cc ee dd" ) print(sample) corpus &lt;- Corpus(VectorSource(sample)) inspect(corpus) corpus &lt;- tm_map(corpus, removeNumbers) corpus &lt;- tm_map(corpus, removePunctuation) corpus &lt;- tm_map(corpus, tolower) corpus &lt;- tm_map(corpus, removeWords, stopwords("english")) corpus &lt;- tm_map(corpus, stemDocument,language="english") corpus &lt;- tm_map(corpus, stripWhitespace) corpus &lt;- tm_map(corpus, tmTagPOS) inspect(corpus) dtm &lt;- DocumentTermMatrix(corpus) inspect(dtm) # need to create similarity matrix here dist(dtm, method = "manhattan", diag = FALSE, upper = FALSE) </code></pre> <p>The output for the given sample should look like this</p> <p><img src="https://i.stack.imgur.com/xwIYJ.jpg" alt="Similarity matrix"></p> <p>The similarity matrix is defined as:</p> <pre><code>if (i &lt; j) a[i][j] = sim[i][j] else a[i][j] = 0 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload