Note that there are some explanatory texts on larger screens.

plurals
  1. POMahout: CSV to vector and running the program
    primarykey
    data
    text
    <p>I'm analysing the k-means algorithm with Mahout. I'm going to run some tests, observe performance, and do some statistics with the results I get.</p> <p>I can't figure out the way to run my own program within Mahout. However, the command-line interface might be enough.</p> <p>To run the sample program I do</p> <pre><code>$ mahout seqdirectory --input uscensus --output uscensus-seq $ mahout seq2sparse -i uscensus-seq -o uscensus-vec $ mahout kmeans -i reuters-vec/tfidf-vectors -o uscensus-kmeans-clusters -c uscensus-kmeans-centroids -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25 </code></pre> <p>The dataset is one large CSV file. Each line is a record. Features are comma separated. The first field is an ID. Because of the input format I can not use seqdirectory right away. I'm trying to implement the answer to this similar question <a href="https://stackoverflow.com/questions/8785392/how-to-perform-k-means-clustering-in-mahout-with-vector-data-stored-as-csv">How to perform k-means clustering in mahout with vector data stored as CSV?</a> but I still have 2 Questions:</p> <ol> <li>How do I convert from CSV to SeqFile? I guess I can write my own program using Mahout to make this conversion and then use its output as input for seq2parse. I guess I can use CSVIterator (<a href="https://cwiki.apache.org/confluence/display/MAHOUT/File+Format+Integrations" rel="nofollow noreferrer">https://cwiki.apache.org/confluence/display/MAHOUT/File+Format+Integrations</a>). What class should I use to read and write?</li> <li>How do I build and run my new program? I couldn't figure it out with the book Mahout in action or with other questions here.</li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload