StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POUsing a smoother with the L Method to determine the number of K-Means clusters
primarykey
Id
4033821
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
15
CommunityOwnedDate
CreationDate
2010-10-27T13:35:54.657
FavoriteCount
10
LastActivityDate
2014-06-02T20:22:50.760
LastEditDate
2012-02-07T18:34:51.527
LastEditorUserId
635608
OwnerUserId
481927
ParentId
0
PostTypeId
1
Score
16
ViewCount
1857
LastEditorDisplayName
text
Body
Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater increase in speed? Which smoothing algorithm/method did you use? The "L-Method" is detailed in: <a href="http://cs.fit.edu/~pkc/papers/ictai04salvador.pdf" rel="noreferrer">Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Salvador & Chan</a> This calculates the evaluation metric for a range of different trial cluster counts. Then, to find the knee (which occurs for an optimum number of clusters), two lines are fitted using linear regression. A simple iterative process is applied to improve the knee fit - this uses the existing evaluation metric calculations and does not require any re-runs of the k-means. For the evaluation metric, I am using a reciprocal of a simplified version of the Dunns Index. Simplified for speed (basically my diameter and inter-cluster calculations are simplified). The reciprocal is so that the index works in the correct direction (ie. lower is generally better). K-means is a stochastic algorithm, so typically it is run multiple times and the best fit chosen. This works pretty well, but when you are doing this for 1..N clusters the time quickly adds up. So it is in my interest to keep the number of runs in check. Overall processing time may determine whether my implementation is practical or not - I may ditch this functionality if I cannot speed it up.
Tags
<algorithm><cluster-analysis><k-means><linear-regression>
Title
Using a smoother with the L Method to determine the number of K-Means clusters
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USMat
UserOwnerUserId
1. USwinwaed
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POUsing a smoother with the L Method to determine the number of K-Means clusters
 UserUserId
 USPascal Qyy
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POUsing a smoother with the L Method to determine the number of K-Means clusters
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POUsing a smoother with the L Method to determine the number of K-Means clusters
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.