StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow can I cluster a graph in Python?
primarykey
Id
653496
data
AcceptedAnswerId
653724
AnswerCount
8
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2009-03-17T09:21:07.750
FavoriteCount
7
LastActivityDate
2017-05-02T00:36:40.727
LastEditDate
2017-05-02T00:36:40.727
LastEditorUserId
1571709
OwnerUserId
46634
ParentId
0
PostTypeId
1
Score
14
ViewCount
14265
LastEditorDisplayName
Pietro Speroni
text
Body
Let G be a graph. So G is a set of nodes and set of links. I need to find a fast way to partition the graph. The graph I am now working has only 120*160 nodes, but I might soon be working on an equivalent problem, in another context (not medicine, but website development), with millions of nodes. So, what I did was to store all the links into a graph matrix: <pre><code>M=numpy.mat(numpy.zeros((len(data.keys()),len(data.keys())))) </code></pre> Now M holds a 1 in position s,t, if node s is connected to node t. I make sure M is symmetrical M[s,t]=M[t,s] and each node links to itself M[s,s]=1. If I remember well if I multiply M with M, the results is a matrix that represents the graph that connects vertexes that are reached on through two steps. So I keep on multplying M with itself, until the number of zeros in the matrix do not decrease any longer. Now I have the list of the connected components. And now I need to cluster this matrix. Up to now I am pretty satisfied with the algorithm. I think it is easy, elegant, and reasonably fast. I am having trouble with this part. Essentially I need to split this graph into its connected components. I can go through all the nodes, and see what are they connected to. But what about sorting the matrix reordering the lines. But I don't know if it is possible to do it. What follows is the code so far: <pre><code>def findzeros(M): nZeros=0 for t in M.flat: if not t: nZeros+=1 return nZeros M=numpy.mat(numpy.zeros((len(data.keys()),len(data.keys())))) for s in data.keys(): MatrixCells[s,s]=1 for t in data.keys(): if t<s: if (scipy.corrcoef(data[t],data[s])[0,1])>threashold: M[s,t]=1 M[t,s]=1 nZeros=findzeros(M) M2=M*M nZeros2=findzeros(M2) while (nZeros-nZeros2): nZeros=nZeros2 M=M2 M2=M*M nZeros2=findzeros(M2) </code></pre> <hr> <h3>Edit:</h3> It has been suggested that I use SVD decomposition. Here is a simple example of the problem on a 5x5 graph. We shall use this since with the 19200x19200 square matrix is not that easy to see the clusters. <pre><code>import numpy import scipy M=numpy.mat(numpy.zeros((5,5))) M[1,3]=1 M[3,1]=1 M[1,1]=1 M[2,2]=1 M[3,3]=1 M[4,4]=1 M[0,0]=1 print M u,s,vh = numpy.linalg.linalg.svd(M) print u print s print vh </code></pre> Essentially there are 4 clusters here: (0),(1,3),(2),(4) But I still don't see how the svn can help in this context.
Tags
<python><sorting><matrix><cluster-analysis><graph-theory>
Title
How can I cluster a graph in Python?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USDominique Fortin
UserOwnerUserId
1. USPietro Speroni
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POHow can I cluster a graph in Python?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POHow can I cluster a graph in Python?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POHow can I cluster a graph in Python?
 UserUserId
 USPhil H
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.