Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow analysis #5

Open
dgrunwald opened this issue Jun 8, 2020 · 7 comments
Open

Flow analysis #5

dgrunwald opened this issue Jun 8, 2020 · 7 comments

Comments

@dgrunwald
Copy link
Member

dgrunwald commented Jun 8, 2020

Currently our inference re-uses Roslyn's flow-analysis.

However, this has some fundamental problems.

9:	Dictionary<T, Node> mapping = new Dictionary<T, Node>();

11:	Node? GetNode(T element)
	{
13:		Node? node;
14:		if (!mapping.TryGetValue(element, out node))
15:		{
16:			node = new Node();
17:			mapping.Add(element, node);
18:		}
19:		return node;
	}

GetNode

There's no edges created for line 16/17 because here Roslyn knows that node is non-null.
Line 14 creates two edges: one from <nullable> because TryGetValue will assign null when returning false; the other from mapping!1#2 (the mapping field's Node type argument) when TryGetValue returns true.

The return statement creates an edge from the variable's type, because Roslyn's flow analysis can't guarantee us that the variable is non-null -- our Roslyn code analysis runs under the pessimistic assumption that all types-to-be-inferred might end up nullable, so it considers mapping to be Dictionary<T, Node?>, which leaves open the possibility that GetNode returns null.

However, after our inference decides that mapping!1#2 is non-null, it would be correct to also indicate that the GetNode return value is non-null. After all, if no node exists yet, the function will create one.

The issue here is that Roslyn's flow analysis isn't aware of our types-to-be-inferred.
It would be better if, instead of using Roslyn's flow analysis, we had our own that keeps track of node's nullability.

The idea would be to create additional "helper" graph nodes for the different flow states of a local variable of reference type.
After TryGetValue initializes node, it's flow-state would be (true: mapping!1#2, false: <nullable>). Within the if body, the flow-state would initially be <nullable>, but after line 16 would change to <nonnull>.
After the if, the flow-state from both alternatives could be re-combined by creating a new node "node-after-line-18" and edges from the nodes from the two if-branches -- in this case <nonnull> from the then-branch and mapping!1#2 from the else branch.
Then the return statement would create an edge from this "node-after-line-18" instead of the node for the variable's declared type.
All flow-state nodes associated with a variable would have an edge to the variable's declared type node.
We'd end up with a graph somewhat like this:
GetNode-cfg

Thus in the end, node would be inferred as nullable, but the GetNode return type would only depend on mapping!1#2 and thus can be inferred depending on whether there's a mapping.Add(x, null) access somewhere else in the program.

@GrahamTheCoder
Copy link
Member

GrahamTheCoder commented Jun 9, 2020

If I've understood correctly, I think a simple solution might be to run multiple passes I.e. Write in current inferred state to the trees, then create a new semantic which now uses the inferred annotations.
The naive approach (rerun from scratch except the initial rewrite) would likely be slow. But you can retain the graph, and restart from the first null edges known potentially?

GetSpeculativeSemanticModel could potentially help where only a single referenced related file had changed since the last pass - I'm assuming it's faster but haven't checked.
While I'm thinking about it, AnalzeDataFlow is another method I keep meaning to check if would provide anything extra that's useful.

@dgrunwald
Copy link
Member Author

That would work in a system that starts with everything as nullable, infers some non-null types, then run multiple passes (until fixed point) finding more non-null types.

It doesn't really work in the current "minimize compiler warnings" system -- Roslyn flow analysis seeing something as non-null causes us to omit edges, which could then cause us to switch a non-null type back to nullable. There's no guarantee the passes would converge to anything.

@dgrunwald
Copy link
Member Author

dgrunwald commented Jun 10, 2020

Of course we could keep notes on which types were marked non-null by an earlier run, and leave those fixed in the following passes. That should make the iterations converge.
But each iteration computing a local minimum would not necessarily result in a global minimum.

Also, with a custom flow analysis I have an idea that I think will let me infer [NotNullWhen(bool)] attributes on out parameters:

  • in addition to the primary node for the parameter's type, have two additional nodes onTrueReturn/onFalseReturn for out parameters.
  • both onTrueReturn/onFalseReturn have an edge to the parameter's primary node
  • on return true|false;, connect the current flow-state of the parameter with the onTrueReturn/onFalseReturn node.
  • on return complexCondition, connect the current flow-state with both special nodes
  • at the call side, if the method is called in an if condition, use the onTrueReturn/onFalseReturn nodes as flow-states for the assigned variable on the two if branches
  • after nullabilities are inferred for nodes, if the primary node is nullable and exactly one of onTrueReturn/onFalseReturn is non-nullable, emit the [NotNullWhen(..)] attribute

@dgrunwald
Copy link
Member Author

There's a downside to adding more complex constraints to the graph: having constraints that don't directly correspond to possible compiler warnings, means that we end up minimizing something other than the number of warnings.

But I'm not sure the idea of repeatedly using Roslyn's flow-analysis works out either.
The following is a valid warning-free program:

class Program
{
    public static string Test(string input)
    {
5:        string? a = null;
6:        a = input;
7:        return a;
8:    }
    public static int Main()
    {
11:        string x = string.Empty;
12:        for (int i = 0; i < 10; i++)
13:        {
14:            x = Test(x);
15:        }
16:        return x.Length;
    }
}

Test is the most simple case for flow analysis possible: the return type Test should only depend on the parameter type, never on the local variable.
But the current approach without our own flow analysis produces this graph:
SimpleFlow
The minimum cut is marked with the red edge -- in this case we cut as close to the dereference as possible. This effectively marks all variables as nullable.
In a second run, Roslyn's flow analysis won't do any better than on the first -- unless we tell Roslyn that input is non-nullable, it won't tell us that the return a; is non-nullable via flow analysis. But due to the function being fed its own output in Main, we won't mark the parameter as non-nullable unless Roslyn tells us the return can be non-nullable.

Iterating doesn't work here. We can't solve this case with Roslyn's flow analysis unless we randomly decide to try marking input as non-nullable, but doing that for every variable would be extremely slow. On the other hand, a custom flow analysis would notice that the a = null; assignment is irrelevant for the function's return type when constructing the graph, thus allowing everything to be inferred correctly in just a single pass.

@GrahamTheCoder
Copy link
Member

That would work in a system that starts with everything as nullable, infers some non-null types, then run multiple passes (until fixed point) finding more non-null types.

Ah yes, I was assuming the former at the time of writing despite the extensive documentation indicating the latter!

dgrunwald added a commit that referenced this issue Jun 11, 2020
This initial version does not support loops, and instead resets the complete flow-state whenever a future version of flow-state would need to be merged.
It also does not yet support value-dependent flow-state, with VisitCondition() just being a stub that returns the same state for both the true and false branches.
@dgrunwald
Copy link
Member Author

I just pushed a very simple flow-analysis that is barely sufficient to handle the case from my previous comment:
SimpleWithFlow
The only difference to the previous graph is that the return edge starts at input rather than a. But that's sufficient to allow a to be nullable without making the whole cycle nullable -> inference now can find a warning-free solution for this case.

I'll still need to add support for the return value of a condition implying a particular flow-state (as with TryGetValue) to handle the original GetNode function that started this issue.

@dgrunwald
Copy link
Member Author

dgrunwald commented Jun 11, 2020

And now with 3e8ca2b, the original example from this issue:

9:	Dictionary<T, Node> mapping = new Dictionary<T, Node>();

11:	Node GetNode(T element)
	{
13:		Node? node;
14:		if (!mapping.TryGetValue(element, out node))
15:		{
16:			node = new Node();
17:			mapping.Add(element, node);
18:		}
19:		return node;
	}

results in this graph:
GetNode-WithFlow
And thus the GetNode return type can be non-nullable despite the local variable node being nullable.
Note that unlike the mockup graph in the issue description, I ended up not creating helper nodes for most flow-states, instead directly using the original node.
At the end of the if, TypeSystem.Join(<mapping!1#2>, <nonnull>) is used to combine the flow states from the two branches. While in general this can result in the creation of a helper node, in this case Join directly returns <mapping!1#2> as the "more nullable" of the two flow-states.

dgrunwald added a commit that referenced this issue Jun 13, 2020
The overall idea here is:
 * in addition to the primary node for the parameter's type, we have two additional nodes whenTrue/whenFalse for out parameters on methods that return a boolean.
 * on return true|false;, connect the current flow-state of the parameter with the whenTrue/whenFalse node.
 * at the call side, if the method is called in a condition, use the whenTrue/whenFalse nodes as flow-states for the assigned variable
 * after nullabilities are inferred for nodes, if the primary node is nullable and exactly one of whenTrue/whenFalse is non-nullable, emit the `[NotNullWhen(..)]` attribute
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants