Note that there are some explanatory texts on larger screens.

plurals
  1. POHierarchical clusterization heuristics
    primarykey
    data
    text
    <p>I want to explore relations between data items in large array. Every data item represented by multidimensional vector. First of all, I've decided to use clusterization. I'm interested in finding hierarchical relations between clusters (groups of data vectors). I'm able to calculate distance between my vectors. So at the first step I'm finding <em>minimal spanning tree</em>. After that I need to group data vectors according to links in my spanning tree. But at this step I'm disturbed - <strong>how to combine different vectors into hierarchical clusters?</strong> I'm using heuristics: <strong><em>if two vectors linked, and distance between them is very small - that means that they are in the same cluster</em></strong>, <strong><em>if two wectors are linked but distance between them is larger than threshold - that means that they are in different clusters with common root cluster</em></strong>. </p> <p>But maybe there is better solution?</p> <p>Thanks</p> <p><strong>P.S.</strong> Thanks to all! </p> <p>In fact I've tried to use k-means and some variation of CLOPE, but didn't get good results. </p> <p><strong>So, now I'm know that clusters of my dataset actually have complex structure (much more complex than n-spheres).</strong> </p> <p>Thats why I want to use hierarchical clusterisation. Also <strong>I'm guess that clusters are looks like n-dimension concatenations</strong> (like 3d or 2d chain). So I use <strong>single-link</strong> strategy. But I'm disturbed - how to combine different clusters with each other (<strong><em>in which situation I've to make common root cluster, and in which situations I've to combine all sub-clusters in one cluster?</em></strong>). I'm using such simple strategy:</p> <blockquote> <ul> <li>If clusters (or vectors) are too close to each other - I'm combine their content into one cluster (regulated by threshold)</li> <li>If clusters (or vectors) are too far from each other - I'm creating root cluster and put them into it</li> </ul> </blockquote> <p>But using this strategy I've got <em>very large cluster trees</em>. I'm trying to find satisfactory threshold. But maybe there might be better strategy to generate cluster-tree?</p> <p>Here is a simple picture, describes my question:</p> <p><img src="https://i.stack.imgur.com/ytsE5.png" alt="enter image description here"></p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload