Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable checked non-null outside a forEach or Stream lambda triggers an error #941

Closed
sdeleuze opened this issue Mar 25, 2024 · 13 comments · Fixed by #952
Closed

Variable checked non-null outside a forEach or Stream lambda triggers an error #941

sdeleuze opened this issue Mar 25, 2024 · 13 comments · Fixed by #952

Comments

@sdeleuze
Copy link

The most common source of irrelevant errors we see in Spring Framework codebase with NullAway is this use case where the static analysis is not able to consider this.resolvedDataSources non-nullable in code found in lambdas:

public void initialize() {
	if (this.targetDataSources == null) {
		throw new IllegalArgumentException("Property 'targetDataSources' is required");
	}
	this.resolvedDataSources = CollectionUtils.newHashMap(this.targetDataSources.size());
	this.targetDataSources.forEach((key, value) -> {
		Object lookupKey = resolveSpecifiedLookupKey(key);
		DataSource dataSource = resolveSpecifiedDataSource(value);
		this.resolvedDataSources.put(lookupKey, dataSource);
	});
}

Such code generates the following error:

/Users/sdeleuze/workspace/spring-framework/spring-jdbc/src/main/java/org/springframework/jdbc/datasource/lookup/AbstractRoutingDataSource.java:145: error: [NullAway] dereferenced expression this.resolvedDataSources is @Nullable
                        this.resolvedDataSources.put(lookupKey, dataSource);

It is true that some lambda could be executed at a later point when the value has changes, but the fact that by default this is not consider as a valid use case generates a lot of irrelevant errors, and I would like to avoid to add an artificial check inside the lambda.

Could that be refined?

@msridhar
Copy link
Collaborator

@sdeleuze thanks for the report. In general we want to be careful with assumptions regarding when lambdas will execute. We already bake in knowledge regarding certain stream operators, and I think forEach is another example where we could build in knowledge that the lambda runs "soon enough" to not require additional defensive null checks in the common case.

Is forEach the main case in Spring where this kind of issue occurs? Or are there other similar and commonly-used operators where the same issue occurs?

@sdeleuze sdeleuze changed the title Variable checked non-null outside a lambda triggers an error Variable checked non-null outside a forEach lambda triggers an error Mar 25, 2024
@sdeleuze
Copy link
Author

I understand there is nuance here, so maybe let's focus on forEach for the most common types of collection as part of this issue. I have not been able to find obvious other common patterns, but I will create other issues if I find some.

@sdeleuze
Copy link
Author

Would you be open to also make Stream methods like filter, map or flatMap more lenient as well?

@msridhar
Copy link
Collaborator

Hi @sdeleuze yes those other cases you mention make sense as well. As an FYI, I have an upcoming holiday and some other deadlines, so I will likely not get to looking at supporting these scenarios until after April 17 or so.

@sdeleuze sdeleuze changed the title Variable checked non-null outside a forEach lambda triggers an error Variable checked non-null outside a forEach or Stream lambda triggers an error Mar 27, 2024
@msridhar
Copy link
Collaborator

Collection.removeIf is another good candidate for special treatment https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#removeIf-java.util.function.Predicate-

@msridhar
Copy link
Collaborator

@sdeleuze one concern I have with Streams is whether there are cases where a call like filter occurs, returning a Stream object, and then the Stream gets stored away somewhere like a field, and only later does the terminal operation get invoked on the stream. This happens with RxJava Observables, but they are meant to model processing of asynchronous inputs and don't really have an equivalent notion of terminal operations I think. What I'm not sure about is whether java Streams can be used like Observables, or whether that is an anti-pattern. Do you have any insights here?

@sdeleuze
Copy link
Author

sdeleuze commented Apr 22, 2024

Java Stream are lazy so if returned from a method value or stored in a field, this is an argument for not taking in account the null check done outside the lambda. But a big difference compared to reactive API is that they are not reusable, so in practice, I am not sure this use case of usage outside the current method is popular. I suspect it is not.

@msridhar
Copy link
Collaborator

@sdeleuze I put up #952 as an initial implementation to fix this issue. I'm still a bit unsure about the Stream case, though I think in almost all cases it should be safe to treat the methods similarly to forEach. Let me see if I can do some code searches to try to gather evidence that, almost always, streams do not get passed around but instead have a terminal operation invoked in the same method where they are created.

@msridhar
Copy link
Collaborator

Thinking more here, and this is all fuzzy: the bad case for a Stream would be when it gets stored in a field. Returning a Stream or passing it as a parameter is not a problem in and of itself, as long as it gets consumed without other intervening computations that change nullability facts. Storing a Stream in a field to me would a clear indication that it is intended to be used "later." But, the Javadoc explicitly states that "A stream is not a data structure that stores elements," so storing it in a field seems strange, along with the fact that it can be consumed only once. I did some initial searching and didn't find any Stream-type fields, though the search was hardly exhaustive.

Using extra nullability information inside Stream lambdas is not sound, of course. But I really doubt it would lead to missed issues in practice, at least for reasonably-structured code. So, I am leaning toward propagating the extra nullability info to lambdas passed to all Stream methods.

@yuxincs @lazaroclapp if you have further thoughts here or disagree with the above reasoning please comment.

@lazaroclapp
Copy link
Collaborator

I recall (@yuxincs can correct me) that for NilAway we had some analysis of when lambdas "escaped" the current function. Could we do something similar for streams here? Like, as far as I can tell there are four cases, from most to least conservative:

  1. The chain of stream operations must all happen within a single statement of the containing method/lambda (that's what our current stream propagation is most designed to handle, correct? We'd just need to extend it so fields are propagated as well if they aren't, right?)
  2. Stream operations happen in different statements in the containing method/lambda, but the Stream object is never passed to another method/lambda, returned or assigned to a field.
  3. Passed to a method/lambda, but neither caller nor callee(s) persist the Stream in a field anywhere.
  4. Incomplete Streams are stored in fields.

Even for (2) there are some caveats for lambda's and fields, but they are similar to our caveats about aliasing:

E.g.

this.foo = "abcd";
someStream = someStream.filter(c -> this.foo.contains(c));
this.foo = null;
return someStream.collect(Collectors.toList()); // null deref as filter gets evaluated here, right?

For (3), the space for potential errors grows just based on how far the lambda expression might be from the point it gets executed.

How many of the cases we care about would be handled by just (1)? (The one in the example for the issue certainly would be, it's just that we currently don't uproot access paths for fields of this afaik)

@msridhar
Copy link
Collaborator

msridhar commented Apr 27, 2024

@lazaroclapp I think ideally, I'd like to implement your case 1, and see if there are still annoying false positives in practice. You're right that this is the case our current stream handlers are designed to handle. I guess the easiest implementation of this might be to have a list T of stream terminal operations, and then check if there is an appropriate parent node of the lambda in the AST that is a call to a method in T? We'd have to create the T list but otherwise this may not be too hard to implement, though it will add some complexity and runtime overhead.

I did some searching around, and I did find several cases where Stream objects are returned from methods. However, I could not really need if propagating more nullability facts would be relevant to such cases; I suspect it might not be.

Anyway, for simplicity maybe let's just continue the discussion in #952 🙂

msridhar added a commit that referenced this issue Apr 30, 2024
…onously (#952)

Fixes #941

We propagate full nullability info from the enclosing context to
callbacks passed to `Map.forEach`, `Iterable.forEach`, `List.removeIf`,
and all methods on `java.util.stream.Stream`
@sdeleuze
Copy link
Author

sdeleuze commented May 3, 2024

Thanks for working on this, looking forward leveraging it in Spring with the upcoming NullAway related release.

@msridhar
Copy link
Collaborator

msridhar commented May 3, 2024

@sdeleuze the fix for this issue is part of NullAway 0.10.26 which was just released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants