StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>A simple cluster measure:<br> 1) draw "sunburst" rays from each point to its nearest cluster centre,<br> 2) look at the lengths — distance( point, centre, metric=... ) — of all the rays. </p> <p>For <code>metric="sqeuclidean"</code> and 1 cluster, the average length-squared is the total variance <code>X.var()</code>; for 2 clusters, it's less ... down to N clusters, lengths all 0. "Percent of variance explained" is 100 % - this average.</p> <p>Code for this, under <a href="https://stackoverflow.com/questions/5529625/is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means">is-it-possible-to-specify-your-own-distance-function-using-scikits-learn-k-means</a>:</p> <pre><code>def distancestocentres( X, centres, metric="euclidean", p=2 ): """ all distances X -> nearest centre, any metric euclidean2 (~ withinss) is more sensitive to outliers, cityblock (manhattan, L1) less sensitive """ D = cdist( X, centres, metric=metric, p=p ) # |X| x |centres| return D.min(axis=1) # all the distances </code></pre> <p>Like any long list of numbers, these distances can be looked at in various ways: np.mean(), np.histogram() ... Plotting, visualization, is not easy.<br> See also <a href="https://stats.stackexchange.com/questions/tagged/clustering">stats.stackexchange.com/questions/tagged/clustering</a>, in particular<br> <a href="https://stats.stackexchange.com/questions/11691/how-to-tell-if-data-is-clustered-enough-for-clustering-algorithms-to-produce-me">How to tell if data is “clustered” enough for clustering algorithms to produce meaningful results?</a></p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload