Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Well there are two practical solutions to the the problem of intelligent selection of the number of centroids (k) in common use.</p> <p>The first is to <strong>PCA</strong> your data, and the output from PCA--which is the principal components (eigenvectors) and their cumulate contribution to the variation observed in the data--obviously suggests an optimal number of centroids. (E.g., if 95% of the variability in your data is explained by the first three principal components, then k=3 is a wise choice for k-means.)</p> <p>The second commonly used practical solution to intelligently estimate k is is a revised implementation of the k-means algorithm, called <strong>k-means++</strong>. In essence, k-means++ just differs from the original k-means by the additional of a pre-processing step. During this step, the number and initial position of the centroids and estimated. </p> <p>The algorithm that k-means++ relies on to do this is straightforward to understand and to implement in code. A good source for both is a 2007 <a href="http://lingpipe-blog.com/2009/03/23/arthur-vassilvitskii-2007-kmeans-the-advantages-of-careful-seeding/" rel="nofollow">Post</a> in the <em>LingPipe Blog</em>, which offers an excellent explanation of k-means++ as well as includes a citation to the original paper that first introduced this technique.</p> <p>Aside from providing an optimal choice for k, k-means++ is apparently superior to the original k-means in both performance (roughly 1/2 processing time compared with k-means in one published comparison) and accuracy (three orders of magnitude improvement in error in the same comparison study).</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload