StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSlow performing cypher query that creates nodes to group existing nodes by property values
primarykey
Id
18376500
data
AcceptedAnswerId
18459021
AnswerCount
2
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-08-22T09:25:57.187
FavoriteCount
0
LastActivityDate
2013-08-27T07:02:42.057
LastEditDate
LastEditorUserId
0
OwnerUserId
2706422
ParentId
0
PostTypeId
1
Score
2
ViewCount
456
LastEditorDisplayName
text
Body
I have a performance issue with a modifying cypher query. Given is an origin node that has a huge amount of outgoing relationships to child nodes. These child nodes all have a key property. Now the goal is to create new nodes between the origin and the child nodes to group all child nodes which share the same key properties value. A plot of that idea can be found at the neo4j console: <a href="http://console.neo4j.org/?id=vinntj" rel="nofollow">http://console.neo4j.org/?id=vinntj</a> I use the query together with spring-data-neo4j 2.2.2.RELEASE and neo4j 1.9.2 embedded. The parameter for that query must be a node id and the result of that query should be the modified root node. The query currently looks like (a bit more complex than in the linked neo4j console): <pre><code>START root=node({0}) MATCH (root)-[r:LEAF]->(child) SET root.__type__='my.GroupedRoot' DELETE r WITH child.`custom-GROUP` AS groupingKey, root AS origin, child AS leaf CREATE UNIQUE (origin)-[:GROUP]->(group{__type__:'my.Group',key:'GROUP',value:groupingKey,origin:ID(origin)})-[:LEAF]->(leaf) RETURN DISTINCT origin </code></pre> The property custom-GROUP is the key to group by. In SDN it is represented by a DynamicProperties object. I annotated it to be indexed as well as the groupingKey and origin property of the created group node. With 5000 child nodes it takes ~50sec to group them. For 10000 nodes ~90sec. For 20000 nodes ~380s and for 30000 nodes > 50min! This looks like an o(log n) scale to me. But my goal is an o(n) scale and to get 500000+ child nodes processed below 30min. I assume that the CREATE UNIQUE part of that query causes that problem because for new group nodes it always need to check what kind of group nodes have already been created. And the amount to check grows with the amount of already grouped child nodes. Does someone have an idea about how to get this query faster? Or to do the same thing faster with an other query?
Tags
<neo4j><cypher><spring-data-neo4j>
Title
Slow performing cypher query that creates nodes to group existing nodes by property values
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USh3nrik
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSlow performing cypher query that creates nodes to group existing nodes by property values
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POSlow performing cypher query that creates nodes to group existing nodes by property values
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.