Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I cluster a graph in Python?
    primarykey
    data
    text
    <p>Let G be a graph. So G is a set of nodes and set of links. I need to find a fast way to partition the graph. The graph I am now working has only 120*160 nodes, but I might soon be working on an equivalent problem, in another context (not medicine, but website development), with millions of nodes. </p> <p>So, what I did was to store all the links into a graph matrix:</p> <pre><code>M=numpy.mat(numpy.zeros((len(data.keys()),len(data.keys())))) </code></pre> <p>Now M holds a 1 in position s,t, if node s is connected to node t. I make sure M is symmetrical M[s,t]=M[t,s] and each node links to itself M[s,s]=1.</p> <p>If I remember well if I multiply M with M, the results is a matrix that represents the graph that connects vertexes that are reached on through two steps.</p> <p>So I keep on multplying M with itself, until the number of zeros in the matrix do not decrease any longer. Now I have the list of the connected components. And now I need to cluster this matrix.</p> <p>Up to now I am pretty satisfied with the algorithm. I think it is easy, elegant, and reasonably fast. I am having trouble with this part.</p> <p>Essentially I need to split this graph into its connected components.</p> <p>I can go through all the nodes, and see what are they connected to.</p> <p>But what about sorting the matrix reordering the lines. But I don't know if it is possible to do it.</p> <p>What follows is the code so far:</p> <pre><code>def findzeros(M): nZeros=0 for t in M.flat: if not t: nZeros+=1 return nZeros M=numpy.mat(numpy.zeros((len(data.keys()),len(data.keys())))) for s in data.keys(): MatrixCells[s,s]=1 for t in data.keys(): if t&lt;s: if (scipy.corrcoef(data[t],data[s])[0,1])&gt;threashold: M[s,t]=1 M[t,s]=1 nZeros=findzeros(M) M2=M*M nZeros2=findzeros(M2) while (nZeros-nZeros2): nZeros=nZeros2 M=M2 M2=M*M nZeros2=findzeros(M2) </code></pre> <hr> <h3>Edit:</h3> <p>It has been suggested that I use SVD decomposition. Here is a simple example of the problem on a 5x5 graph. We shall use this since with the 19200x19200 square matrix is not that easy to see the clusters.</p> <pre><code>import numpy import scipy M=numpy.mat(numpy.zeros((5,5))) M[1,3]=1 M[3,1]=1 M[1,1]=1 M[2,2]=1 M[3,3]=1 M[4,4]=1 M[0,0]=1 print M u,s,vh = numpy.linalg.linalg.svd(M) print u print s print vh </code></pre> <p>Essentially there are 4 clusters here: (0),(1,3),(2),(4) But I still don't see how the svn can help in this context.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload