Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __hash__/__eq__ to requirements #499

Merged
merged 16 commits into from Mar 15, 2022

Conversation

abravalheri
Copy link
Contributor

@abravalheri abravalheri commented Jan 18, 2022

Hello, this PR is a feature request that started as a conversation in #498 (comment). I also believe it might close #453.

The idea here is to be able to compare requirement objects as well as be able to compare sets containing those objects, e.g.:

Requirement("packaging") == Requirement("packaging")
{Requirement("packaging"), Requirement("appdirs")} == {Requirement("packaging"), Requirement("appdirs")} 

The approach I used was to rely on the normalization that happens when the requirement object is converted to string to implement both __eq__ and __hash__.

docs/requirements.rst Outdated Show resolved Hide resolved
packaging/requirements.py Outdated Show resolved Hide resolved
@abravalheri
Copy link
Contributor Author

abravalheri commented Jan 21, 2022

Thank you very much @brettcannon for the review, I have submitted some commits addressing the proposed changes.

Please note that in order to circumvent the str conversion during the comparison of Requirement, I had to make Marker also comparable and this comparison does impose some processing cost (please let me know if you have any suggestion to prevent that cost).

Alternatively we could still make use of str in Marker.__eq__.

docs/requirements.rst Outdated Show resolved Hide resolved
if not isinstance(other, Marker):
return NotImplemented

return _flatten_marker(self._markers) == _flatten_marker(other._markers)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to flattening _markers here would be doing it directly on the __init__ method (that could potentially also simplify the _format_marker function)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in pre-compute the string? Is the string used enough to warrant the forced cost of doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily the string. The key point here seems to be flattening parenthesized groups with a single element (or the data structure equivalent to the parsing of those).

The "flattening" seems to be one of the central keys of the string conversion too.

We could do this flattening in the __init__ function without pre-computing the string, and that would facilitate both __eq__ and __str__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to rely on the string representation for simplicity, else we are duplicating algorithms for walking the markers. If it turns out performance is a problem we can cache it, but we can wait on that until that actually becomes a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that definitely is the simplest approach. I have updated the implementation accordingly.

(To be honest my first impulse was to do a string comparison, but I refrained from adopting that by overthinking about a previous comment, in a different context, about the cost of the string conversion).

@brettcannon brettcannon self-requested a review January 22, 2022 00:03
@pradyunsg pradyunsg self-requested a review January 25, 2022 00:31
Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what anyone thinks about accepting this overall, but if this were to get accepted I think the hash tests should be a bit more clear as to what they are testing for.

packaging/markers.py Outdated Show resolved Hide resolved
if not isinstance(other, Marker):
return NotImplemented

return _flatten_marker(self._markers) == _flatten_marker(other._markers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in pre-compute the string? Is the string used enough to warrant the forced cost of doing that?

tests/test_markers.py Outdated Show resolved Hide resolved
tests/test_requirements.py Outdated Show resolved Hide resolved
@abravalheri
Copy link
Contributor Author

abravalheri commented Jan 28, 2022

Hi @brettcannon, thank you very much for the review. Regarding the hashing test, I was really unfortunate with the naming since it does not reflect what I had in mind.

Something important to say is that, the idea of the test was not to check if the objects are correctly being hashed... To be sincere I think all of that is just an implementation detail. The real objective of the tests were to check if the objects can be used as elements of sets and if comparisons would work fine in that context. Knowing that the output of the hash() function matches does not seem to bring a lot of value to me... (let's hypothesise that Python 16 decides to change how set comparisons work, testing the output of hash() could not necessarily imply that the functionality I am after would keep working).

Please let me know if clarifying the names of the tests and splitting them up acordingly would work for you and I will do my best to come up with better names. Otherwise I also don't have a problem and just simplifying the tests to use hash().

@brettcannon
Copy link
Member

I actually think the opposite of you. 😄 I view the ability to put an object in a container a side-effect of implementing __hash__ and __eq__.

Now if you want to consolidate testing both methods into testing if the objects get put into a container as appropriate, then that's fine, just make sure to name and document the tests appropriately. I don't have strong opinions either way, but I think it should be one or the other approaches.

@abravalheri
Copy link
Contributor Author

abravalheri commented Jan 31, 2022

Thank you very much for the comment @brettcannon.

I think it is important to test directly __eq__ (that should be by far the most common use case motivating this change), so I went with your suggestion of testing with the hash() function in my last commit.

@brettcannon brettcannon self-requested a review March 8, 2022 03:06
Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this simple and treat the string representation as the simple encoding for comparison.

Also some test simplification to save on some time and electricity.

Comment on lines 184 to 192
assert isinstance(marker, (list, tuple))

if isinstance(marker, tuple):
return marker

if len(marker) == 1:
return _flatten_marker(marker[0])

return [_flatten_marker(e) if isinstance(e, list) else e for e in marker]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert isinstance(marker, (list, tuple))
if isinstance(marker, tuple):
return marker
if len(marker) == 1:
return _flatten_marker(marker[0])
return [_flatten_marker(e) if isinstance(e, list) else e for e in marker]
assert isinstance(marker, (list, tuple))
if isinstance(marker, tuple):
return marker
elif len(marker) == 1:
return _flatten_marker(marker[0])
else:
return [_flatten_marker(e) if isinstance(e, list) else e for e in marker]

if not isinstance(other, Marker):
return NotImplemented

return _flatten_marker(self._markers) == _flatten_marker(other._markers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to rely on the string representation for simplicity, else we are duplicating algorithms for walking the markers. If it turns out performance is a problem we can cache it, but we can wait on that until that actually becomes a problem.

tests/test_markers.py Outdated Show resolved Hide resolved
tests/test_markers.py Outdated Show resolved Hide resolved
Comment on lines 244 to 245
# Markers should not be comparable with other kinds of objects.
assert marker1 != example1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be its own test instead of having to do it repeatedly as part of a parameterized test.

Comment on lines 246 to 248
# Requirement objects should not be comparable with other kinds of objects.
assert req1 != dep1
assert req2 != dep2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a separate test that isn't repeated multiple time needlessly.

Co-authored-by: Brett Cannon <brett@python.org>
@abravalheri
Copy link
Contributor Author

Thank you very much for the updated reviews.

I have adopted the suggested changes and also rebased the PR.

@brettcannon brettcannon self-requested a review March 15, 2022 17:47
packaging/markers.py Outdated Show resolved Hide resolved
@brettcannon brettcannon merged commit aebc072 into pypa:main Mar 15, 2022
@brettcannon
Copy link
Member

Thanks for much for sticking with this, @abravalheri !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add __eq__ to Requirement and Marker
3 participants