Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This sounds like you need BFS <a href="http://en.wikipedia.org/wiki/Breadth-first_search" rel="nofollow">http://en.wikipedia.org/wiki/Breadth-first_search</a></p> <p><strike><strong>Online approach:</strong></strike> <strike>I think it can expensive depending on how you want to use it. On worst case you would iterate all the data in the database: cost runtime <code>O(n)</code> (assume you have a lookup function to find the user in the graph in runtime <code>O(1)</code>).</strike></p> <p><strong>Offline approach</strong> You could do offline scheduled pre-calculation and store the distances as a lookup function but it requires some additional memory <code>O(n*n)</code> where n is number of users. The cost for the lookup function is now only <code>O(1)</code> or <code>O(logn)</code> depending on how you implement it <em>(disregarding the offline runtime which I would think will be in the area <code>O(n)</code> to <code>O(n*n)</code>)</em>. </p> <p><strong>Strategy</strong> The strategy you want to follow can be depended on number of users you can expect as an upper limit and how well the users are connected to each other. If you have few users, online approach might be fine, if you have million of users, then you probably need an offline approach but it will cost you some memory.</p> <p><strong>Other considerations</strong></p> <ul> <li>Mix online and offline approach</li> <li>Use caching strategies</li> <li>Whenever a new reference is updated for a user, update the distance lookup function</li> </ul> <p><hr/> <strong><em>Updated Answer</em></strong> There are 17 mio. users, we will need offline approach.</p> <p>I would follow the offline version. You should avoid <code>O(n*n)</code> runtime which I think is possible. </p> <p><strong>DB model</strong></p> <p>You should think how you would model the DB as this will be the most expensive part of this implementation.</p> <p>Maybe something like: Create a table for every user (table-name could be userId). And every table has entries for every user (the record key is userId). This will result in 17 mio. tables with 17 mio. entries each (This is the <code>O(n*n)</code> cost).</p> <p><strong>Offline</strong> you run the BFS once while keeping track of which user you have visited and at which level you are in the BFS iteration and save the distance to the DB. I haven't thought this part through but I think this strategy is feasible. Remember to run BFS on every node, i.e. until you have visited all the users. If this strategy is not feasible then you could run BFS from every node which is <code>O(n*n)</code> runtime. This means it could take something like a month to run on worst case, i.e. your distance data could be old. How fast this runs depends on how connected your users are.</p> <p>Or you could do the approach if possible "Whenever a new reference is updated for a user, update the distance lookup function". This would run BFS once which is <code>O(n)</code>, i.e. a few seconds. Invoke BFS(userId) on first time event and afterwards on reference update.</p> <p><strong>Online</strong> you fetch the table by table-name using userId and fetch the entry by another userId to get the distance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload