Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>For getting your data in SequenceFile format, you have a couple of strategies you can take. Both involve writing your own code -- i.e., not strictly command-line.</p> <p><strong>Strategy 1</strong> Use Mahout's CSVVectorIterator class. You pass it a java.io.Reader and it will read in your CSV file, turn each row into a DenseVector. I've never used this, but saw it in the API. Looks straight-forward enough if you're ok with DenseVectors.</p> <p><strong>Strategy 2</strong> Write your own parser. This is really easy, since you just split each line on "," and you have an array you can loop through. For each array of values in each line, you instantiate a vector using something like this:</p> <pre><code>new DenseVector(&lt;your array here&gt;); </code></pre> <p>and add it to a List (for example).</p> <p>Then ... once you have a List of Vectors, you can write them to SequenceFiles using something like this (I'm using NamedVectors in below code):</p> <pre><code>FileSystem fs = null; SequenceFile.Writer writer; Configuration conf = new Configuration(); List&lt;NamedVector&gt; vectors = &lt;here's your List of vectors obtained from CSVVectorIterator&gt;; // Write the data to SequenceFile try { fs = FileSystem.get(conf); Path path = new Path(&lt;your path&gt; + &lt;your filename&gt;); writer = new SequenceFile.Writer(fs, conf, path, Text.class, VectorWritable.class); VectorWritable vec = new VectorWritable(); for (NamedVector vector : dataVector) { vec.set(vector); writer.append(new Text(vector.getName()), vec); } writer.close(); } catch (Exception e) { System.out.println("ERROR: "+e); } </code></pre> <p>Now you have a directory of "points" in SequenceFile format that you can use for your K-means clustering. You can point the command line Mahout commands at this directory as input. </p> <p>Anyway, that's the general idea. There are probably other approaches as well.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload