StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POCannot Instantiate The type Cluster, KMean clustering Example in Mahout
text
Body
copied!<p>Hi i was trying to run KmeanClustering Example in Mahout, getting stucked with an error in the sample Code. I'm getting error in the the below code snipet</p> <p><strong>Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure());</strong> </p> <p>It gives an error </p> <blockquote> <p>Cannot instantiate the Type Cluster</p> </blockquote> <p>(which is an Interface, my understanding).I want to run kmeans on My sample dataSet, Can anyone guide me in that too.</p> <p>I have Included The following Jars in my EClipse IDE</p> <p>mahout-math-0.7-cdh4.3.0.jar</p> <p>hadoop-common-2.0.0-cdh4.2.1.jar</p> <p>hadoop-hdfs-2.0.0-cdh4.2.1.jar</p> <p>hadoop-mapreduce-client-core-2.0.0-cdh4.2.1.jar</p> <p>mahout-core-0.7-cdh4.3.0.jar</p> <p>Check if i'm missing any essential jar, I will be running this On Hadoop CDH4.2.1</p> <p>Here attaching my whole Code, taken from <a href="https://github.com/tdunning/MiA/blob/master/src/main/java/mia/clustering/ch07/SimpleKMeansClustering.java" rel="nofollow">Github</a></p> <pre><code>package tryout; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.math.RandomAccessSparseVector; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable; import org.apache.mahout.clustering.Cluster; import org.apache.mahout.clustering.classify.WeightedVectorWritable; import org.apache.mahout.clustering.kmeans.KMeansDriver; import org.apache.mahout.common.distance.EuclideanDistanceMeasure; public class SimpleKMeansClustering { public static final double[][] points = { {1, 1}, {2, 1}, {1, 2}, {2, 2}, {3, 3}, {8, 8}, {9, 8}, {8, 9}, {9, 9}}; public static void writePointsToFile(List<Vector> points, String fileName,FileSystem fs,Configuration conf) throws IOException { Path path = new Path(fileName); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, LongWritable.class, VectorWritable.class); long recNum = 0; VectorWritable vec = new VectorWritable(); for (Vector point : points) { vec.set(point); writer.append(new LongWritable(recNum++), vec); } writer.close(); } public static List<Vector> getPoints(double[][] raw) { List<Vector> points = new ArrayList<Vector>(); for (int i = 0; i < raw.length; i++) { double[] fr = raw[i]; Vector vec = new RandomAccessSparseVector(fr.length); vec.assign(fr); points.add(vec); } return points; } public static void main(String args[]) throws Exception { int k = 2; List<Vector> vectors = getPoints(points); File testData = new File("testdata"); if (!testData.exists()) { testData.mkdir(); } testData = new File("testdata/points"); if (!testData.exists()) { testData.mkdir(); } Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); writePointsToFile(vectors, "testdata/points/file1", fs, conf); Path path = new Path("testdata/clusters/part-00000"); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,path, Text.class, Cluster.class); for (int i = 0; i < k; i++) { Vector vec = vectors.get(i); Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure()); writer.append(new Text(cluster.getIdentifier()), cluster); } writer.close(); KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"), new Path("output"), new EuclideanDistanceMeasure(), 0.001, 10, true, false); SequenceFile.Reader reader = new SequenceFile.Reader(fs,new Path("output/" + Cluster.CLUSTERED_POINTS_DIR+ "/part-m-00000"), conf); IntWritable key = new IntWritable(); WeightedVectorWritable value = new WeightedVectorWritable(); while (reader.next(key, value)) { System.out.println(value.toString() + " belongs to cluster " + key.toString()); } reader.close(); } } </code></pre> <p>Also guide me that, if i have my own dataset how to approach for that.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload