Note that there are some explanatory texts on larger screens.

plurals
  1. POR text mining - Combining paragraphs one after the other without sentences mixing up
    text
    copied!<p>beginner in R and text mining. Using the tm package currently. </p> <p>I am trying to add the texts of two different documents in a corpora together. when I use a statement like </p> <pre><code> c(corpus.doc[[1]],corpus.doc[[2]]) </code></pre> <p>or the paste statement</p> <pre><code> paste(corpus.doc[[1]],corpus.doc[[2]]) </code></pre> <p>I get a result of texts combined for every line. </p> <p>For example: if </p> <pre><code>&gt; corpus.doc[[1]] He visits very often and sometimes more &gt; corpus.doc[[2]]) She also stays </code></pre> <p>What I get with these statements is something like</p> <pre><code>He visits very often She also and stays sometimes more </code></pre> <p>How can I prevent that and instead get</p> <pre><code>He visits very often and sometimes more She also stays </code></pre> <p>Or is there an easy way to combine documents in the R tm package? Thank you in advance!</p> <hr> <p>Additional info</p> <hr> <p>When I use<br> a &lt;- c( corpus.doc[[1]], corpus.doc[[2]], recursive=TRUE) </p> <p>I get that a becomes a corpus with two documents, so the texts of each of these documents are still not combined. I would like it that </p> <pre><code>a[[1]] </code></pre> <p>gives me the combined text of corpus.doc[[1]] and corpus.doc[[2]]. </p> <pre><code>str(corpus.doc) </code></pre> <p>Shows something like this</p> <pre><code> List of 4270 $ CREC-2011-01-05-pt1-PgE1-2.htm :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic [1:74] html head titlecongression record volume issue head ... .. ..- attr(*, "Author")= chr(0) .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:1], format: "2009-01-17 15:45:25" .. ..- attr(*, "Description")= chr(0) . . ..- attr(, "Heading")= chr(0) .. ..- attr(, "ID")= chr "CREC-2011-01-05-pt1-PgE1- 2.htm" </code></pre> <p>And it keeps going on...</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload