Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You need to count the words in each document and make a feature generally called bag of words. Before that you need to remove stop words(very common but not giving much information like the, a etc). You can generally take top n common words from your document. Count the frequency of these words and store them in n dimensional vector.</p> <p>For distance measure you can use cosine vector.</p> <p>Here is a simple algorithm for 2 mean for 1 dimensional data points. you can extend it to k mean and n dimensional data point easily. Let me know if you want n dim implementation.</p> <p><pre><code> double[] x = {1,2,2.5,3,3.5,4,4.5,5,7,8,8.5,9,9.5,10};</p> <p>double[] center = new int[2]; double[] precenter = new int[2]; ArrayList[] cluster = new ArrayList[2];</p> <p>//generate 2 random number from 0 to x.length without replacement int rand = new int[2]; Random rand = new Random(); rand[0] = rand.nextInt(x.length + 1); rand[1] = rand.nextInt(x.length + 1);</p> <p>while(rand[0] == rand[1] ){ rand[1] = rand.nextInt(x.length + 1); } center[0] = x[rand[0]]; center[1] = x[rand[1]]; //there is a better way to generate k random number (w/o replacement) just search.</p> <p>do{ cluster[0].clear(); cluster[1].clear(); for(int i = 0; i &lt; x.length; ++i){ if(abs(x[i]-center1[0]) &lt;= abs(x[i]-center1[1])){ cluster[0].add(x[i]); } else{ cluster[0].add(x[i]); } precenter[0] = center[0]; precenter[1] = center[1];<br> center[0] = mean(cluster[0]); center[1] = mean(cluster[1]); } } while(precenter[0] != center[0] &amp;&amp; precenter[1] != center[1]);</p> <p>double mean(ArrayList list){ double mean = 0; double sum = 0; for(int index=0;index }</p> <p></pre></code> The cluster[0] and cluster [1] contain points in the clusters and center[0], center[1] are the 2 means. you need to do some debugging because I have written the code in R and just converted it into java for you :)</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload