Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for duplicate keys in multiple yield operators #1408

Closed
maks-rafalko opened this issue Sep 7, 2018 · 18 comments
Closed

Check for duplicate keys in multiple yield operators #1408

maks-rafalko opened this issue Sep 7, 2018 · 18 comments

Comments

@maks-rafalko
Copy link
Contributor

maks-rafalko commented Sep 7, 2018

Summary of a problem or a feature request

When I use, for example, data providers in PHPUnit, I use them by yielding each data set, like this:

/** @dataProvider provider */
public function test_xxx(int $int, string $string) {}

public function provider(): \Generator
{
    yield 'Description of the data set #1' => [123, 'xy'];

    yield 'Description of the data set #2' => [456, 'ab'];
}

When you have a large provider, it's easy to make copy-paste error and end up with the duplicate key:

public function provider(): \Generator
{
    // ...
    yield 'Duplicate key' => [123, 'xy'];
    // ...
    yield 'Duplicate key' => [456, 'ab'];
}

PHPUnit in this case ignores the first one, and we loose 1 test case.

Code snippet that reproduces the problem

https://phpstan.org/r/6eef732e950c1001e202097358d510cf

Expected output

Expected ouput from PHPStan:

Method sayHello() has 2 duplicate keys in yield with value 'Duplicate Key'

@ondrejmirtes
Copy link
Member

Hi, thanks for the proposal. Feel free to send a PR :)

@JanTvrdik
Copy link
Contributor

JanTvrdik commented Dec 11, 2018

Yielding the same key is not a bug and may be perfectly valid for some use cases. You should instead open feature request for PHPUnit to report such data providers as broken.

@maks-rafalko
Copy link
Contributor Author

What are the valid cases?

@JanTvrdik
Copy link
Contributor

@maks-rafalko
Copy link
Contributor Author

This does not prove anything. If ID is not unique, this is also an issue.

PHP allows returning duplicate keys from Generator, but there is no information about the correctness of such things. Moreover,

PHP also supports associative arrays, and generators are no different.

http://php.net/manual/en/language.generators.syntax.php#control-structures.yield.associative

Since you can't have duplicate keys in associative arrays, and "generators are no different", I would say that this is not a valid case to return duplicate key from Generator, even if it is supported by the language.

What are the real cases where returning a duplicate key is ok?

@Majkl578
Copy link
Contributor

but there is no information about the correctness of such things

Duplicates are always correct, period. There is no incorrect variant with duplicate keys because there is no such constraint implied.

PHP also supports associative arrays, and generators are no different.

This implies nothing about uniqueness. It only compares behavior to arrays. You cant ArrayAccess generator anyway.

Just like duplicate keys are allowed with iterators: https://3v4l.org/Z8Wb9

@iluuu1994
Copy link
Contributor

Duplicate keys in generators work just fine in PHP. But I don't see why you'd ever wanna do something like that. Nonetheless, PHPStan should not report correct code. Just because it's a weird language quirk doesn't mean it's incorrect.

You might have more success with pushing this feature in phpstan-strict-rules which is more opinionated.

@maks-rafalko
Copy link
Contributor Author

You might have more success with pushing this feature in phpstan-strict-rules which is more opinionated.

I'm okay with that, could you please transfer this issue to that repository using GitHub API?

@Majkl578
Copy link
Contributor

Majkl578 commented Dec 21, 2018

Just because it's a weird language quirk doesn't mean it's incorrect.

I wouldn't call it a quirk. There are valid use cases and you can always avoid the behaviour if you don't like it. Besides, if you assume generators/iterators produce unique array keys, you are already working with broken assumptions since even though they produce different values, they may end up the same in an array (think of 5, "5" and 5.0).

I can see use cases in long-running code where you can create infinite generator that yields updates where key would be updated user entity and value would be a diff / value changed. Just because you don't use it or don't find valid use case doesn't mean it should be forbidden.

I'm strongly against including this in strict-rules, it has nothing to do with strict code.

@Majkl578
Copy link
Contributor

Also you would have to report possibly duplicate keys which would produce A LOT of false positives.

@iluuu1994
Copy link
Contributor

@Majkl578

I can see use cases in long-running code where you can create infinite generator that yields updates where key would be updated user entity and value would be a diff / value changed.

Whenever I foreach over something I'd expect the key to be unique. A less confusing solution would be to yield a tuple instead of an associative key-value pair:

yield ['id' => ..., 'entity': ...];

This would make it much more obvious that the same id can be encountered multiple times.

@Majkl578
Copy link
Contributor

Majkl578 commented Dec 21, 2018

@iluuu1994 Then your assumption/expectation is wrong. foreach doesn't pose such guarantee when working with Traversable and keys don't have to be unique (as can be seen on the 3v4l code posted above).

Your proposal is valid, but complicates things and makes them a lot less readable, or even performant (in critical code).

This would make it much more obvious that the same id can be encountered multiple times.

TBH this is only a workaround for a falsey assumption you are building on top of. Yes, one can yield new Change($user, $diff) but then it becomes harder to comprehend from API and also harder to consume.

@maks-rafalko
Copy link
Contributor Author

Also you would have to report possibly duplicate keys which would produce A LOT of false positives.

the description of this ticket suggests checking only real duplicate string keys, not "possible" or dynamic.

I can't imagine a real case where such code would be a valid:

public function anyMethod(): \Generator
{
    // ...
    yield 'Duplicate key' => $anyValue1;
    // ...
    yield 'Duplicate key' => $anyValue2;
}

@Majkl578
Copy link
Contributor

Majkl578 commented Dec 21, 2018

Still, the code is perfectly valid generator and there is nothing wrong. You can't statically determine whether it was intended or not. If it's a bug, consumer should be aware of that.

I can't imagine a real case

I can imagine multiple where it saves you trouble with arrays, both in producer and consumer.

I know this is a known issue in PHPUnit, but I think the proper place is to fix the symptom - the bug in PHPUnit itself (since it silently discards duplicates), not here.

@iluuu1994
Copy link
Contributor

@Majkl578

but complicates things and makes them a lot less readable, or even performant (in critical code).

If you need high performance you probably shouldn't use generators with a state machine behind them.

but then it becomes harder to comprehend from API and also harder to consume.

How are either of those statements true? The code looks more obvious, it's typed, you'll get nice IDE autocompletion, it's just as easy to consume. You'll get some small overhead from the heap allocation for the Change object. That's it.

IMO this is a valid feature for strict-rules.

@borNfreee

could you please transfer this issue to that repository using GitHub API?

I'm just a contributor, @ondrejmirtes decides what makes it into which repository. Let's wait until he states his opinion on where this should go.

Also, PRs are always welcome! Open source only works when people contribute.

@Majkl578
Copy link
Contributor

IMO this is a valid feature for strict-rules.

How exacltly would this make code stricter?

IMO it's not.

@ondrejmirtes
Copy link
Member

phpstan/phpstan-strict-rules is not a catch-all repo for rules that do not fit phpstan/phpstan :)

This repository contains additional rules that revolve around strictly and strongly typed code with no loose casting for those who want additional safety in extremely defensive programming

As has been said above, duplicate keys from iterators and generators are OK. Does PHPUnit handle them correctly (e.g. running a test case for each item, even with a duplicate key?) Maybe the issue should be reported there.

Of course, anyone is welcome to write and use the proposed rule on their own, but it's not something generally applicable...

@maks-rafalko
Copy link
Contributor Author

BTW, fixed in PHPUnit sebastianbergmann/phpunit#3444

@lock lock bot locked as resolved and limited conversation to collaborators Dec 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants