StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>You can get this into a long format but to go further required real data. <strong>EDITED</strong> after data offered. Still not sure about the overall structure of what is coming from MALLET, but at least the R functions are demonstrated. This approach has the "feature" that proportions are summed if there are overlapping topics. Depending on the data layout that may be an advantage or not.</p> <pre><code>dat <-read.table(textConnection(" V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 0 10.txt 27 0.4560785 23 0.3040853 20 0.1315621 21 0.03632624 2 1 1001.txt 20 0.2660085 12 0.2099153 8 0.1699586 13 0.16922928 3 2 1002.txt 16 0.3341721 2 0.1747023 10 0.1360454 12 0.07507119 4 3 1003.txt 12 0.5366148 8 0.2255179 18 0.1388561 0 0.01867091 5 4 1005.txt 16 0.2363206 0 0.2214441 24 0.1914769 7 0.17760521 "), header=TRUE) ldat <- reshape(dat, idvar=1:2, varying=list(topics=c("V3", "V5", "V7", "V9"), props=c("V4", "V6", "V8", "V10")), direction="long") ####------------------#### > ldat V1 V2 time V3 V4 0.10.txt.1 0 10.txt 1 27 0.45607850 1.1001.txt.1 1 1001.txt 1 20 0.26600850 2.1002.txt.1 2 1002.txt 1 16 0.33417210 3.1003.txt.1 3 1003.txt 1 12 0.53661480 4.1005.txt.1 4 1005.txt 1 16 0.23632060 0.10.txt.2 0 10.txt 2 23 0.30408530 1.1001.txt.2 1 1001.txt 2 12 0.20991530 2.1002.txt.2 2 1002.txt 2 2 0.17470230 3.1003.txt.2 3 1003.txt 2 8 0.22551790 4.1005.txt.2 4 1005.txt 2 0 0.22144410 0.10.txt.3 0 10.txt 3 20 0.13156210 1.1001.txt.3 1 1001.txt 3 8 0.16995860 2.1002.txt.3 2 1002.txt 3 10 0.13604540 3.1003.txt.3 3 1003.txt 3 18 0.13885610 4.1005.txt.3 4 1005.txt 3 24 0.19147690 0.10.txt.4 0 10.txt 4 21 0.03632624 1.1001.txt.4 1 1001.txt 4 13 0.16922928 2.1002.txt.4 2 1002.txt 4 12 0.07507119 3.1003.txt.4 3 1003.txt 4 0 0.01867091 4.1005.txt.4 4 1005.txt 4 7 0.17760521 </code></pre> <p>Now can show you how to use xtabs() since those "proportions" are "numeric". Something like this may eventually be what you want. I was surprised that the topics were also integers but perhaps there is a mapping from topic numbers to topic names?:</p> <pre><code>> xtabs(V4 ~ V3 + V2, data=ldat) V2 V3 10.txt 1001.txt 1002.txt 1003.txt 1005.txt 0 0.00000000 0.00000000 0.00000000 0.01867091 0.22144410 2 0.00000000 0.00000000 0.17470230 0.00000000 0.00000000 7 0.00000000 0.00000000 0.00000000 0.00000000 0.17760521 8 0.00000000 0.16995860 0.00000000 0.22551790 0.00000000 10 0.00000000 0.00000000 0.13604540 0.00000000 0.00000000 12 0.00000000 0.20991530 0.07507119 0.53661480 0.00000000 13 0.00000000 0.16922928 0.00000000 0.00000000 0.00000000 16 0.00000000 0.00000000 0.33417210 0.00000000 0.23632060 18 0.00000000 0.00000000 0.00000000 0.13885610 0.00000000 20 0.13156210 0.26600850 0.00000000 0.00000000 0.00000000 21 0.03632624 0.00000000 0.00000000 0.00000000 0.00000000 23 0.30408530 0.00000000 0.00000000 0.00000000 0.00000000 24 0.00000000 0.00000000 0.00000000 0.00000000 0.19147690 27 0.45607850 0.00000000 0.00000000 0.00000000 0.00000000 </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload