Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving large graphs with Spring Neo4j #2587

Open
mrksph opened this issue Sep 6, 2022 · 11 comments
Open

Saving large graphs with Spring Neo4j #2587

mrksph opened this issue Sep 6, 2022 · 11 comments
Labels
status: feedback-provided Feedback has been provided

Comments

@mrksph
Copy link

mrksph commented Sep 6, 2022

Hi all,

I'm encountering some problems while trying to save a relatively big graph using Spring Data Neo4j .save() method passing the aggregate root. In the following image, you can see an example (the graph in the image is not complete, it's a little larger than that)

image

Is there any other way to speed up the save?

I tried to save first the nodes at depth 1 or depth 2 using concurrency but I think it won't work.

Thanks

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 6, 2022
@meistermeier
Copy link
Collaborator

What do you want to update? If something "near" the root entity, projections might help to make SDN just ignore the deeper related notes. Also they are a good fit if you want to go deep into a specific branch of the graph without looking left or right. Different projections can be used for different use-cases.
If you want to update a node somewhere in the middle or a leaf node, I would suggest to just save this (e.g. with Neo4jTemplate if no repository is needed) instead of being 100% DDD accurate and save through the aggregate root.

@mrksph
Copy link
Author

mrksph commented Sep 7, 2022

I want to save a graph which has many levels and many children at each level, just like the one in the image included in the OP.
Like the following example but with many more children at each level, each can have many children also.

Client A - aggregated root

  • Client B
    • Client BA
      • Client BAA
    • Client BB
      • Client BBA
      • Client BBB
        • Client BBBB
  • Client C
    • Client CA
  • Client DB
    • Client DBA
    • Client DBB
      • Client DBBA

It is very slow when I try to save the object like this: clientRepository.save(clientA)

@meistermeier
Copy link
Collaborator

There is a 6.3.3-SNAPSHOT available that should improve relationship performance. Maybe you could give it a try. I am happy to hear your feedback. Related issue: #2593

@meistermeier meistermeier added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-triage An issue we've not yet triaged labels Sep 14, 2022
@mrksph
Copy link
Author

mrksph commented Sep 15, 2022

Hi Gerrit thanks for your help! Any chance we getting this in 6.1.x? Because of compatibility issues we can't upgrade to SDN 6.3.x yet

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Sep 15, 2022
@meistermeier
Copy link
Collaborator

Unfortunately this version won't get any updates Pascal Release train mentions OSS Support until: May 2022
What are your problems regarding compatibility? SDN 6.3 should also work as a drop in replacement.

@meistermeier
Copy link
Collaborator

There is now a 6.1.13-REL-PERFORMANCE-SNAPSHOT in the making. You could give it a try, when it is released (assuming ~1 hour from now).

@mrksph
Copy link
Author

mrksph commented Sep 15, 2022

Hi Gerrit, awesome news, thank you!

Regarding the compatibility issues I don't remember exactly why we are tied to 6.1.x , I guess it was related to the fact that our IT test were failing due to #2488 maybe?

I just tried to run our tests with SDN 6.3.2 and I'm getting

throw new IllegalStateException("The provided database selection provider differs from the Neo4jClient's one.");

@meistermeier
Copy link
Collaborator

That's just the fact that you already defined a database selection "somewhere" in your config and maybe in you tests you are using the Neo4jClient...in(database) syntax.
Disclaimer: The SNAPSHOT release above is completely unsupported :D But would be good to hear from you if this improves the experience.

@mrksph
Copy link
Author

mrksph commented Sep 15, 2022

Well, in regard to this issue I will try to take a look tomorrow.

What you're saying about me having Neo4jClient...in(database) I think it's not that. We only have simple tests for our controllers and services and a few for our Integration Tests so...

I see that release 6.3.2 includes the fix for the issue I mentioned earlier but neo4jClient.getDatabaseSelectionProvider() is still returning null when I run my (integration) tests

EDIT:

Maybe I should add that our failing IT tests set up an embedded server, which may cause the selection provider to return null.

I've just tried to specify a spring.data.neo4j.database = "neo4j" but when the Neo4jTemplate bean is instantiated, it checks the neo4jClient.getDatabaseSelectionProvider() which is null

@mrksph
Copy link
Author

mrksph commented Sep 16, 2022

Okay, so as we are tied to Spring Boot 2.4.x because our dependency with Spring Cloud 2020 we can't upgrade to SDN 6.2.1+ (https://docs.spring.io/spring-data/neo4j/docs/6.2.0/reference/html/#dependencies.spring-framework) or 6.3.0+ (https://docs.spring.io/spring-data/neo4j/docs/6.3.0/reference/html/#dependencies.spring-framework) because both needs a newer Spring Core version, currently Spring Boot 2.4.13 pulls Spring Framework 5.3.13

Also, the neo4j-java-driver dependency was being pulled by the APOC plugin dependency which was the reason I was getting the Exception I mentioned previously. I had to exclude it in my pom.xml to avoid getting that Exception while running our Integration Tests.

@jrsperry
Copy link

@mrksph I have been saving large graphs similar to your example with neo4j ogm. Depending on the complexity of the graph I’ve seen it perform anywhere from 4-10 faster than sdn.

Here’s a link to an issue I have open with linked projects showing the performance difference. May be worth giving the ogm a shot if sdn is still too slow.

#2636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: feedback-provided Feedback has been provided
Projects
None yet
Development

No branches or pull requests

4 participants