Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>If you have vectors, you can run KMeansDriver. Here is the help for the same.</p> <pre><code>Usage: [--input &lt;input&gt; --clusters &lt;clusters&gt; --output &lt;output&gt; --distance &lt;distance&gt; --convergence &lt;convergence&gt; --max &lt;max&gt; --numReduce &lt;numReduce&gt; --k &lt;k&gt; --vectorClass &lt;vectorClass&gt; --overwrite --help] Options --input (-i) input The Path for input Vectors. Must be a SequenceFile of Writable, Vector --clusters (-c) clusters The input centroids, as Vectors. Must be a SequenceFile of Writable, Cluster/Canopy. If k is also specified, then a random set of vectors will be selected and written out to this path first --output (-o) output The Path to put the output in --distance (-m) distance The Distance Measure to use. Default is SquaredEuclidean --convergence (-d) convergence The threshold below which the clusters are considered to be converged. Default is 0.5 --max (-x) max The maximum number of iterations to perform. Default is 20 --numReduce (-r) numReduce The number of reduce tasks --k (-k) k The k in k-Means. If specified, then a random selection of k Vectors will be chosen as the Centroid and written to the clusters output path. --vectorClass (-v) vectorClass The Vector implementation class name. Default is SparseVector.class --overwrite (-w) If set, overwrite the output directory --help (-h) Print out help </code></pre> <p>Update: Get the result directory from HDFS to local fs. Then use ClusterDumper utility to get the cluster and list of documents in that cluster.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload