Note that there are some explanatory texts on larger screens.

plurals
  1. POStoring very large graphs on disk/streaming graph partitioning algorithms?
    primarykey
    data
    text
    <p>Suppose that I have a very large undirected, unweighted graph (starting at hundreds of millions of vertices, ~10 edges per vertex), non-distributed and processed by single thread only and that I want to do breadth-first searches on it. I expect them to be I/O-bound, thus I need a good-for-BFS disk page layout, disk space is not an issue. The searches can start on every vertex with equal probability. Intuitively that means minimizing the number of edges between vertices on different disk pages, which is a graph partitioning problem.</p> <p>The graph itself looks like a spaghetti, think of random set of points randomly interconnected, with some bias towards shorter edges.</p> <p>The problem is, how does one partition graph this large? The available graph partitioners I have found work with graphs that fit into memory only. I could not find any descriptions nor implementations of any streaming graph partitioning algorithms.</p> <p>OR, maybe there is an alternative to partitioning graph for getting a disk layout that works well with BFS?</p> <p>Right now as an approximation I use the fact that the vertices have spatial coordinates attached to them and put the vertices on disk in Hilbert sort order. This way spatially close vertices land on the same page, but the presence or absence of edge between them is completely ignored. Can I do better?</p> <p>As an alternative, I can split graph into pieces using the Hilbert sort order for vertices, partition the subgraphs, stitch them back and accept poor partitioning on the seams.</p> <p>Some things I have looked into already:</p> <ol> <li><a href="https://stackoverflow.com/questions/1526479/how-to-store-a-large-directed-unweighted-graph-with-billions-of-nodes-and-vertice">How to store a large directed unweighted graph with billions of nodes and vertices</a></li> <li><a href="http://neo4j.org/" rel="nofollow noreferrer">http://neo4j.org/</a> - I found zero information on how does it do graph layout on disk</li> </ol> <p>Partitioning implementations (unless I'm mistaken, all of them need to fit graph into memory):</p> <ol> <li><a href="http://glaros.dtc.umn.edu/gkhome/views/metis" rel="nofollow noreferrer">http://glaros.dtc.umn.edu/gkhome/views/metis</a></li> <li><a href="http://www.sandia.gov/~bahendr/chaco.html" rel="nofollow noreferrer">http://www.sandia.gov/~bahendr/chaco.html</a></li> <li><a href="http://staffweb.cms.gre.ac.uk/~c.walshaw/jostle/" rel="nofollow noreferrer">http://staffweb.cms.gre.ac.uk/~c.walshaw/jostle/</a></li> <li><a href="http://www.cerfacs.fr/algor/Softs/MESHPART/" rel="nofollow noreferrer">http://www.cerfacs.fr/algor/Softs/MESHPART/</a></li> </ol> <p>EDIT: info on how the graphs looks like and that BFS can start everywhere. EDIT: idea on partitioning subgraphs</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload