Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserve large arrays with same keys through union #3032

Conversation

schlndh
Copy link
Contributor

@schlndh schlndh commented Apr 27, 2024

Fixes https://phpstan.org/r/0071beb4-becd-47c0-8a17-b52407da0c22
Fixes phpstan/phpstan#9397
Fixes phpstan/phpstan#10080

Basically, I have a large array shape, which I'm not modifying (and it's not a tagged union either). I just use the elements individually. But this causes the array to be generalized into a "useless" non-empty-array<...>. As a workaround, I can pull the elements used in conditions into variables and then use the variables, but it is quite inconvenient.

So this PR attempts to maintain the array shape in the simplest case where 2 constant arrays with the same keys are being unioned together.

@schlndh schlndh force-pushed the fix-dontGeneralizeUnionOfLargeConstantArraysWithSameKeys branch from b2762e3 to 22b25e6 Compare April 27, 2024 12:22
@ondrejmirtes
Copy link
Member

According to issue-bot this fixes 2 issues, please write regression tests for them 😊 Thank you.

/**
* @param non-empty-list<ConstantArrayType[]> $constantArraysMap
*/
private static function unionConstantArrayTypesWithSameKeys(array $constantArraysMap): ?ConstantArrayType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the looks of the name, there's already a method with a very similar goal: TypeCombinator::reduceArrays. Can't we call that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried replacing the call to unionConstantArrayTypesWithSameKeys with this (if (count($reducedArrayTypes) === 1) { so that I don't turn off the generalization completely).

$reducedArrayTypes = self::reduceArrays($arrayTypes);
if (count($reducedArrayTypes) === 1) {
	return $reducedArrayTypes;
}

It worked with the test-cases that I had, but it is not quite equivalent. I added a test-case to reject this option. The problem is in ConstantArrayType::isKeysSupersetOf (

$failOnDifferentValueType = true;
). It allows at most one value type to differ.


The goal of unionConstantArrayTypesWithSameKeys is to detect the union of

array{K1: T11, K2: T12, ... KN: T1N}
array{K1: T21, K2: T22, ... KN: T2N}
...
array{K1: TM1, K2: TM2, ... KN: TMN}

and return array{K1: T11|T21|...|TM1, K2: T12|T22|...|TM2, ..., KN: T1N|T2N|...|TMN}.

If the input doesn't match the simple target situation, then the method just returns null and we can proceed with the generalization.


reduceArrays seems to do the following:

  • Separate non-empty constant arrays, non-constant arrays and empty array.
  • Process the constant arrays:
    • For each pair check if one array is a "superset" of the other one, and if so merge them together.
  • Return an array of all the remaining constant array types, as well as the non-constant arrays and the empty array.

It works in some cases, but it is more complicated than it needs to be for the scenario that I had in mind. And there are a lot of subtle details involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that having two different methods that do very similar things is gonna bite us in the future. Can we improve reduceArrays and use it in more places so that it works for fixing these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processArrayTypes is certainly quite complicated already, and this isn't making it any simpler. But I'm not sure how to modify reduceArrays such that it can be used in both places, and it is simpler than having a separate method.

The problem for me is that reduceArrays maintains tagged unions: E.g. array{a: 1, b: int}|array{a: 2, b: string} should remain unchanged by reduceArrays. However, I assume that this is something that we can't afford when the arrays involved are too large. So there is the generalization, which would (assuming lower generalization limit) turn this into non-empty-array<'a'|'b', int|string>.

With my change we would lose the tagged union, but at least we'd maintain the general shape of the array: array{a: 1|2, b: string|int}.

So reduceArrays would have to know whether or not it should try to maintain the tagged unions. Either there would be a boolean flag, or it could re-check the size of the arrays involved, ...


The issue in my use-case happens because after a branching statement (if, ternary operator, cycles, ...) the scopes from non-terminating branches are unioned together and they replace the previous scope. However, in my use-case the array is not modified in any of the branches and yet the result of the union is a more general type than it was before the branching statement.

So perhaps this could be another way of fixing the issue: detect scenarios where the variable is not modified and the new type is wider and keep the previous type instead. But it seems complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either there would be a boolean flag, or it could re-check the size of the arrays involved, ...

sounds good to me to at least try it!

So perhaps this could be another way of fixing the issue

this should also be possible. We have all the information available. There are three scopes at play - let’s say we’re analysing if-else. There the scope before the statement (original) and there’s the if scope and else scope. If the type of the expression is equal in all three, we don’t have to run the union() operation. There might even be performance benefit to that…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me to at least try it!

You have a tendency to be right about these things. 👍

There are three scopes at play - let’s say we’re analysing if-else. There the scope before the statement (original) and there’s the if scope and else scope. If the type of the expression is equal in all three, we don’t have to run the union() operation. There might even be performance benefit to that…

That wouldn't help in my use-case: the types are different in the branches (e.g. the condition is a null-check on one of the dimensions), but their union should be the same (or narrower) than the original scope, which it isn't, because the arrays are too large and they get generalized.

Anyway, after I wrote the previous comment, I realized that it's better to fix it in the union method (if possible), because that is a more general fix.

@schlndh schlndh force-pushed the fix-dontGeneralizeUnionOfLargeConstantArraysWithSameKeys branch from 53438cc to 2c609e4 Compare May 1, 2024 07:44
@@ -708,6 +713,10 @@ private static function processArrayTypes(array $arrayTypes): array
}

if ($generalArrayOccurred && (!$overflowed || $filledArrays > 1)) {
$reducedArrayTypes = self::reduceArrays($arrayTypes, false);
if (count($reducedArrayTypes) === 1) {
return [self::intersect($reducedArrayTypes[0], ...$accessoryTypes)];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, currently the accessory types will always be empty (so I can't test this). But I added them anyway to be safe in case new accessory types are added in the future.

Comment on lines +855 to +859
// This is an optimization for current use-case of $preserveTaggedUnions=false, where we need
// one constant array as a result, or we generalize the $constantArrays.
if (!$preserveTaggedUnions) {
return $constantArrays;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm conflicted about this. On one hand, it would feel bad not to have this condition and run a lot of the method uselessly. But on the other hand, the condition depends on the particular usage of $preserveTaggedUnions. Maybe a better name for $preserveTaggedUnions would help, but I didn't come up with anything. So I at least added a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine 👍

@ondrejmirtes
Copy link
Member

Awesome, I really like this! 👍

@ondrejmirtes ondrejmirtes merged commit 6705ac1 into phpstan:1.10.x May 1, 2024
442 of 443 checks passed
@ondrejmirtes
Copy link
Member

Thank you!

@staabm
Copy link
Contributor

staabm commented May 2, 2024

Nice one 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants