Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@Node with interfaces returned as a parent class #2839

Open
EliotRagueneau opened this issue Nov 28, 2023 · 3 comments
Open

@Node with interfaces returned as a parent class #2839

EliotRagueneau opened this issue Nov 28, 2023 · 3 comments
Assignees
Labels
status: needs-investigation An issue that has been triaged but needs further investigation type: enhancement A general enhancement

Comments

@EliotRagueneau
Copy link

EliotRagueneau commented Nov 28, 2023

I have an issue where I have defined some @node with interfaces so that I can fetch a group of Nodes which are unrelated in terms of inheritance.

Because of this, my nodes are having labels that the NodeDescriptionStore is not aware of, since it doesn't resolve interface as Labels.

Because of that, none of the haystack definitions are a perfect match for the result labels, which is absolutely fine in theory .

if (staticLabels.containsAll(labels)) {
Set<String> surplusLabels = new HashSet<>(labels);
staticLabels.forEach(surplusLabels::remove);
return new NodeDescriptionAndLabels(nd, surplusLabels);
}

Since none of them is a perfect match, it then try to pick the mostMatchingNodeDescription and that is where the problem lies I believe:

int unmatchedLabelsCount = 0;
List<String> matchingLabels = new ArrayList<>();
for (String staticLabel : staticLabels) {
if (labels.contains(staticLabel)) {
matchingLabels.add(staticLabel);
} else {
unmatchedLabelsCount++;
}
}
unmatchedLabelsCache.put(nd, unmatchedLabelsCount);
if (mostMatchingNodeDescription == null || unmatchedLabelsCount < unmatchedLabelsCache.get(mostMatchingNodeDescription)) {
mostMatchingNodeDescription = nd;
mostMatchingStaticLabels = matchingLabels;
}
}

Indeed, with the current implementation, if A extends B and B extends C

A --> B --> C

but the record node labels are [A,B,C,D], then even though A is the most "concrete" class, A, B and C node description will all have an unmatchedLabelCount of 0.
Therefore, the NodeDescription being picked in the end is the one that was first in the haystack, not the one with the most matching labels.

I would suggest something like

if (
    mostMatchingNodeDescription == null || 
    unmatchedLabelsCount < unmatchedLabelsCache.get(mostMatchingNodeDescription) || 
    (unmatchedLabelsCount ==  unmatchedLabelsCache.get(mostMatchingNodeDescription) && matchingLabels.size() > mostMatchingStaticLabels.size())
) {
	mostMatchingNodeDescription = nd;
        mostMatchingStaticLabels = matchingLabels;
}

This way, if there is a tie in terms of "unmatchness", we pick the one with the best "matchness", aka, the one with the biggest amount of matching labels

Thanks for all the support you already provided to Reactome team,
Best regards,
Eliot

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 28, 2023
@meistermeier
Copy link
Collaborator

Could you please rephrase the scenario again you are currently in? I think that, after writing down a few assumptions in my own words, I am on the wrong track.
Do you still have this inheritance chain?
Did you define an interface for D?
How are you querying those nodes (and for which type in the repository or Neo4jTemplate)?
I am unable to connect the chain to the unmatchedLabelCount = 0 statement.

@meistermeier meistermeier self-assigned this Nov 29, 2023
@meistermeier meistermeier added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-triage An issue we've not yet triaged labels Nov 29, 2023
@EliotRagueneau
Copy link
Author

Hi, sorry for my lack of clarity.

Yes we are having the inheritance chain, it goes as follow:

public interface D {}

public class C {}

public class B extends C implements D {}

public class A extends B {}

The problem occurs when using neo4jTemplate with cutsom query.
To be specific, the query is the following:

MATCH path=(p:Pathway{stId:$stId})-[:hasEvent*]->(rle:ReactionLikeEvent) 
WHERE SINGLE(x IN NODES(path) WHERE (x:Pathway) AND x.hasDiagram) 
MATCH (rle)-[:input|output|catalystActivity|physicalEntity|regulatedBy|regulator*]->(pe:EntityWithAccessionedSequence) 
WITH DISTINCT pe 
MATCH (pe)-[hm:hasModifiedResidue]->(tm:TranslationalModification) 
RETURN DISTINCT pe, collect(hm), collect(tm) 

so it basically returns a node, plus some additional attributes of the node from the database.
The problem comes when node pe has labels [DatabaseObject, PhysicalEntity, GenomeEncodedEntity, EntityWithAccessionedSequence, Deletable, Trackable]. The first 4 labels are coming from the inheritence chain, and the last 2, we included them because PhysicalEntity implements Deletable and Trackable.

Either spring-data-neo4j doesn't support node definition implementing interfaces, or we configured them wrong (maybe we need to annotate the interfaces with @Node too, not sure). In either case, what happens is that EntityWithAccessionedSequence NodeDescription doesn't have Deletable and Trackable among its additionalLabels.

That is not a problem by itself, however, that leads to a situation where none of the NodeDescription in the haystack are a "perfect match" for the node received from the database. Therefore, computeConcreteNodeDescription uses its 2nd startegy which is to find its mostMatchingNodeDescription. However, there is a problem with the current implementation to find the mostMatchingNodeDescription: it is only considering unmatchedLabelsCount, but never the matchedLabelsCount. This way, any NodeDescription that has 0 unmatchedLabelsCount is considered as the best match. This however is problematic in the case of inheritance chain, as all the parents of EntityWithAccessionedSequence will have by definition 0 labels which are different .

This makes that the first node among the inheritance chain in the haystack that have a unmatchedLabelsCount of 0 will be picked up, and the others will not be able to be the mostMatchingNodeDescription. If you are lucky, the first one in the haystack is the one you're looking for. If you are not, it is another one, in our case GenomeEncodedEntity instead of the desired EntityWithAccessionedSequence.

This problem was noticed because we are using interfaces, but I think it would arise anytime you have nodes in the database having more labels than those described in the SDN model. The little snippet I provided in the issue description should fix this issue I believe, as indeed, among those having the lowest unmatchedLabelsCount, it accepts the one with the biggest amount of matchingLabels.

In our specific case,

Class unmatchedLabelsCount matchingLabels.size()
DatabaseObject 0 1
PhysicalEntity 0 2
GenomeEncodedEntity 0 3
EntityWithAccessionedSequence 0 4

A perfect match being of course [0, 6] in this situation, the closest to being perfect would be the EntityWithAccessionedSequence.

I think another implementaiton that could lead to similar outcomes would be to replace unmatchedLabelsCount by a labelDifference with something like

int labelDifference = 0; 
List<String> matchingLabels = new ArrayList<>(); 
for (String staticLabel : staticLabels) { 
 	if (labels.contains(staticLabel)) { 
 		matchingLabels.add(staticLabel); 
                 labelDifference--;
 	} else { 
 		labelDifference++; 
 	} 
 } 
Class labelDifference
DatabaseObject -1
PhysicalEntity -2
GenomeEncodedEntity -3
EntityWithAccessionedSequence -4

But these are just proposals, you might find better and cleaner solution.

I hope this clarified a bit our problem, and thanks for your support

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Nov 29, 2023
@EliotRagueneau
Copy link
Author

Just to keep you informed, by adding the @Node annotation to the interface declarations, they are well taken into account in the NodeDescription, which mean that the NodeDescriptionStore is able to find a perfect match!

However, the 2nd strategy is still invalid for inheritance chains if they are labels in the database that are not defined in SDN.
I think that it is something you could easily improve , though no longer really required for our usage.

@meistermeier meistermeier added status: needs-investigation An issue that has been triaged but needs further investigation type: enhancement A general enhancement and removed status: feedback-provided Feedback has been provided labels Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs-investigation An issue that has been triaged but needs further investigation type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants