Note that there are some explanatory texts on larger screens.

plurals
  1. PONeo4j shortest path with cycles
    primarykey
    data
    text
    <p>I've been evaluating Neo4j 1.9.M03 for a while now, and have come to a head that I did not expect. </p> <p>I have a graph of ~140,000 vertices. I also have three classes of edges, let's call them father, mother, and husband. There are about 80,000 edges per class. There are no properties, and no indexes. The vertex store size is about 1.3 MB, and the edge store is about 8 MB. </p> <p>The data originates in SQL Server and the quality of migration from SQL to Neo4j is known to be correct. A SQL shortest path stored procedure has been run for dozens of vertex pairs, such that the shortest path distance and path is known.</p> <p>The shortest path query is Cypher: <code>START one=node(0), two=node(1234) MATCH p = shortestPath(one-[*..1000]-two) RETURN p;</code></p> <p><strong>PARTIAL TEST CASE ONE:</strong> I use only husband and father relations, the occurrence of cycles (e.g. <code>v[0] -&gt; v[1] -&gt; v[2] -&gt; v[0])</code> is low. If I perform a shortest path calculation on a specific known long path (e.g. known to be ~450 hops), it returns within 50ms (non-cached) with a path of ~550 hops. The increased length is expected, since we are excluding a portion of the edges.</p> <p><strong>PARTIAL TEST CASE TWO</strong>: Likewise, if I use only husband and mother relations, the occurrence of cycles (e.g. <code>v[0] -&gt; v[1] -&gt; v[2] -&gt; v[0])</code> is low. If I perform the same shortest path, I get a result on the same order as before: about 50ms (non-cached), with a similar increase in path length.</p> <p><strong>FULL TEST CASE</strong>: I use all (father, mother, and husband) relations. The occurrence of cycles is now predictably high because of the common case <code>v[0] mother-&gt; v[1] husband-&gt; v[2] &lt;-father v[0]</code>. When I execute the shortest path query, the JVM allocates 4 gigabytes of memory and the calculation does not complete. <strong>This is the problem.</strong></p> <hr> <p>My thesis is that the regular occurrence of cycles is causing this behavior, otherwise I would not expect such a huge difference in performance when I only add another class of parent edge--unless the shortest path algorithm was not accounting for cycles.</p> <p>I applied the Dijkstra Algorithm using the Java API directly, with a cost of 1 across all edges, and achieved similar results to the standard ShortestPath algorithm used. As a result, I received this exception after 6 minutes of IntelliJ debug time.</p> <pre><code>Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.neo4j.kernel.impl.util.RelIdArray$RelIdIteratorImpl.&lt;init&gt;(RelIdArray.java:661) at org.neo4j.kernel.impl.util.RelIdArray$DirectionWrapper$3.iterator(RelIdArray.java:327) at org.neo4j.kernel.impl.util.RelIdArray.iterator(RelIdArray.java:270) at org.neo4j.kernel.impl.core.NodeImpl.getAllRelationships(NodeImpl.java:172) at org.neo4j.kernel.impl.core.NodeImpl.getRelationships(NodeImpl.java:270) at org.neo4j.kernel.impl.core.NodeProxy.getRelationships(NodeProxy.java:82) at org.neo4j.kernel.StandardExpander$AllExpander.doExpand(StandardExpander.java:303) at org.neo4j.kernel.StandardExpander$RelationshipExpansion.iterator(StandardExpander.java:194) at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationshipsWithoutChecks(TraversalBranchImpl.java:114) at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships(TraversalBranchImpl.java:104) at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.initialize(TraversalBranchImpl.java:130) at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.next(TraversalBranchImpl.java:150) at org.neo4j.graphalgo.impl.util.BestFirstSelectorFactory$BestFirstSelector.next(BestFirstSelectorFactory.java:73) at org.neo4j.kernel.impl.traversal.TraverserIterator.fetchNextOrNull(TraverserIterator.java:65) at org.neo4j.kernel.impl.traversal.TraverserIterator.fetchNextOrNull(TraverserIterator.java:34) at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55) at org.neo4j.graphalgo.impl.util.StopAfterWeightIterator.fetchNextOrNull(StopAfterWeightIterator.java:45) at org.neo4j.graphalgo.impl.util.StopAfterWeightIterator.fetchNextOrNull(StopAfterWeightIterator.java:29) at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55) at org.neo4j.helpers.collection.IteratorUtil.firstOrNull(IteratorUtil.java:51) at org.neo4j.helpers.collection.IteratorUtil.firstOrNull(IteratorUtil.java:201) at org.neo4j.graphalgo.impl.path.Dijkstra.findSinglePath(Dijkstra.java:98) at org.neo4j.graphalgo.impl.path.Dijkstra.findSinglePath(Dijkstra.java:50) at ShortestPathCalc.Dijkstra(Main.java:198) at Main.main(Main.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) </code></pre> <p>Do you think I'm right? Is this a known limitation of graph databases or their shortest path algorithms? It seems quite silly to me that previously-visited vertexes would not be stored in a hash table, such that the shortest path algorithm would not attempt to path out of a previously visited vertex more than once.</p> <hr> <p><strong>UPDATE 1/25/2013</strong> </p> <p>A Github repo so you can follow along!</p> <pre><code>https://github.com/squirrelsama/neo4j-shortestpath-issue </code></pre> <hr> <p><strong>UPDATE 2/7/2013</strong></p> <p>See the accepted answer. In short, cycles have nothing to do with it.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload