Note that there are some explanatory texts on larger screens.

plurals
  1. POnaivebayes Mahout 0.7
    primarykey
    data
    text
    <p>I am working on sentiment analysis of tweets. I am using mahout naive bayes classifier for it.I am making a directory "data".Inside "data" I am making three more directories named "positive","negative","uncertain"..Then I kept 151 files(total 151Mb) on each of these positive,negatie and uncertain directory..Then I kept the data directory in hdfs..below are the set of command i ran to generate the model and labelindex out of it.</p> <pre><code>bin/mahout seqdirectory -i ${WORK_DIR}/data -o ${WORK_DIR}/data-seq bin/mahout seq2sparse -i ${WORK_DIR}/data-seq -o ${WORK_DIR}/data-vectors -lnorm -nv -wttfidf bin/mahout split -i ${WORK_DIR}/data-vectors/tfidf-vectors --trainingOutput ${WORK_DIR}/data-train-vectors --testOutput ${WORK_DIR}/data-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential bin/mahout trainnb -i ${WORK_DIR}/data-train-vectors -el -o ${WORK_DIR}/model -li ${WORK_DIR}/labelindex -ow $c </code></pre> <p>I am getting the confusion matrix after testing on the same set of data using "testnb" command as given below:</p> <pre><code> bin/mahout testnb -i ${WORK_DIR}/data-train-vectors -m ${WORK_DIR}/model -l ${WORK_DIR}/labelindex -ow -o ${WORK_DIR}/data-testing $c Confusion Matrix ------------------------------------------------------- a b c &lt;--Classified as 151 0 0 | 151 a = negative 0 151 0 | 151 b = positive 0 0 151 | 151 c = uncertain </code></pre> <p>Then I created a some another directory "data2" in the same way and put some random data(which is a sub set of the training data(30 files(total size 30MB) each)) in positive,negative,uncertain directory inside it .Then i created a vector out of it using the "seq2sparse" command given below :-</p> <p>bin/mahout seqdirectory -i ${WORK_DIR}/data2 -o ${WORK_DIR}/data2-seq</p> <p>bin/mahout seq2sparse -i ${WORK_DIR}/data2-seq -o ${WORK_DIR}/data2-vectors -lnorm -nv -wttfidf</p> <p>On running the "testnb" using the model/lablelindex created from the previous set of data using the command given below:-</p> <pre><code>bin/mahout testnb -i ${WORK_DIR}/data2-vectors/tfidf-vectors/part-r-00000 -m ${WORK_DIR}/model -l ${WORK_DIR}/labelindex -ow -o ${WORK_DIR}/data2-testing $c </code></pre> <p>I am getting confusion matrix like this.</p> <pre><code>Confusion Matrix ------------------------------------------------------- a b c &lt;--Classified as 0 30 0 | 30 a = negative 0 30 0 | 30 b = positive 0 30 0 | 30 c = uncertain </code></pre> <p>Can anyone tell me why this is coming.Am i using the correct way to test the model or it is a bug in mahout 0.7.If it is not the correct way please suggest a way out of it.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload