StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
10897459
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2012-06-05T12:38:36.790
FavoriteCount
0
LastActivityDate
2012-06-20T14:29:02.483
LastEditDate
2012-06-20T14:29:02.483
LastEditorUserId
815507
OwnerUserId
815507
ParentId
10896866
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
This sounds like you need BFS <a href="http://en.wikipedia.org/wiki/Breadth-first_search" rel="nofollow">http://en.wikipedia.org/wiki/Breadth-first_search</a> <strike>Online approach:</strike> <strike>I think it can expensive depending on how you want to use it. On worst case you would iterate all the data in the database: cost runtime <code>O(n)</code> (assume you have a lookup function to find the user in the graph in runtime <code>O(1)</code>).</strike> Offline approach You could do offline scheduled pre-calculation and store the distances as a lookup function but it requires some additional memory <code>O(n*n)</code> where n is number of users. The cost for the lookup function is now only <code>O(1)</code> or <code>O(logn)</code> depending on how you implement it (disregarding the offline runtime which I would think will be in the area <code>O(n)</code> to <code>O(n*n)</code>). Strategy The strategy you want to follow can be depended on number of users you can expect as an upper limit and how well the users are connected to each other. If you have few users, online approach might be fine, if you have million of users, then you probably need an offline approach but it will cost you some memory. Other considerations <ul> <li>Mix online and offline approach</li> <li>Use caching strategies</li> <li>Whenever a new reference is updated for a user, update the distance lookup function</li> </ul> <hr/> Updated Answer There are 17 mio. users, we will need offline approach. I would follow the offline version. You should avoid <code>O(n*n)</code> runtime which I think is possible. DB model You should think how you would model the DB as this will be the most expensive part of this implementation. Maybe something like: Create a table for every user (table-name could be userId). And every table has entries for every user (the record key is userId). This will result in 17 mio. tables with 17 mio. entries each (This is the <code>O(n*n)</code> cost). Offline you run the BFS once while keeping track of which user you have visited and at which level you are in the BFS iteration and save the distance to the DB. I haven't thought this part through but I think this strategy is feasible. Remember to run BFS on every node, i.e. until you have visited all the users. If this strategy is not feasible then you could run BFS from every node which is <code>O(n*n)</code> runtime. This means it could take something like a month to run on worst case, i.e. your distance data could be old. How fast this runs depends on how connected your users are. Or you could do the approach if possible "Whenever a new reference is updated for a user, update the distance lookup function". This would run BFS once which is <code>O(n)</code>, i.e. a few seconds. Invoke BFS(userId) on first time event and afterwards on reference update. Online you fetch the table by table-name using userId and fetch the entry by another userId to get the distance.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POalgorithm to find relationship of two twitter users
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USKunukn
UserOwnerUserId
1. USKunukn
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.