Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate a hash function when frozen is True #1881

Merged

Conversation

rhuille
Copy link
Contributor

@rhuille rhuille commented Aug 30, 2020

Change Summary

Before:

from pydantic import BaseModel

class A(BaseModel):
    x: int
    
    class Config:
        allow_mutation = False

a = A(x=1)
d = {a: 2}

>>> TypeError: unhashable type: 'A'

After:

from pydantic import BaseModel

class A(BaseModel):
    x: int
    
    class Config:
        frozen = True

a = A(x=1)
d = {a: 2}
d[a]
>>> 2

Note that we still have:

from pydantic import BaseModel

class A(BaseModel):
    x: int
    
    class Config():
        frozen = True

a = A(x=1)
d = {a: 2}
d[a]
>>> TypeError: unhashable type: 'A'

Screenshot of the updated documentation:

image

Related issue number

closes #1880

linked issue: #1303

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details)

@codecov
Copy link

codecov bot commented Aug 30, 2020

Codecov Report

Merging #1881 (ad82bf0) into master (7cc8d25) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1881   +/-   ##
=======================================
  Coverage   99.90%   99.90%           
=======================================
  Files          25       25           
  Lines        5030     5036    +6     
  Branches     1030     1030           
=======================================
+ Hits         5025     5031    +6     
  Misses          1        1           
  Partials        4        4           
Impacted Files Coverage Δ
pydantic/main.py 99.06% <100.00%> (+<0.01%) ⬆️
pydantic/mypy.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cc8d25...ad82bf0. Read the comment docs.

@ccharlesgb
Copy link

I think this is a good idea but I don't think this will work if the fields in the model cannot be hashed. For example what happens if you do the following:

from pydantic import BaseModel

class A(BaseModel):
    x: int
    y: List[int]
    
    
    class Config:
        allow_mutation = False

a = A(x=1, y=[1,2,3])
d = {a: 2}
d[a]
>>> ?

Might be worth putting in a validation step to ensure that the fields are hashable?

Or alternatively dataclasses achieves this by assigning a hash attribute to each field and the overall model's hash is the combination of al of these. See: https://github.com/ericvsmith/dataclasses/blob/master/dataclasses.py#L540

I think this would be a better approach as it should allow you to do nested immutability:

from pydantic import BaseModel

class B(BaseModel):
    z: int
    class Config:
        allow_mutation = False


class A(BaseModel):
    x: int
    y: B
    
    
    class Config:
        allow_mutation = False

a = A(x=1, y=B(z=4))
d = {a: 2}

@layday
Copy link
Contributor

layday commented Sep 6, 2020

I've had to roll my own __hash__ function for Pydantic models in the past, so I'm very excited at the prospect of natively 'frozen' models. A few quick notes - you cannot hash self.__dict__.values() directly because the dict_values container is unique. This is why dataclasses places them in a tuple before hashing. You also cannot (should not) implement a __hash__ which behaves differently from __eq__:

In [1]: from pydantic import BaseModel

In [2]: class Foo(BaseModel):
   ...:     def __hash__(self):
   ...:         return hash(self.__dict__.values())
   ...:

In [3]: Foo() == Foo()
Out[3]: True

In [4]: hash(Foo())
Out[4]: 276660083

In [5]: hash(Foo())
Out[5]: 277242134

In [6]: d = {Foo(): True}

In [7]: d[Foo()] = False

In [8]: d.keys()
Out[8]: dict_keys([Foo(), Foo()])

The dataclasses implementation mentioned above is more sophisticated (e.g. it compares values which cannot be hashed) and might be worth investigating.

@rhuille
Copy link
Contributor Author

rhuille commented Sep 6, 2020

Hi @ccharlesgb and @layday , thanks for your answer.

I agree with your comments and I corrected my implementation of __hash__ in: b1a84dc

What do you think ?

If I am correct, this has now the same behavior has in dataclass:

from dataclasses import dataclass
from pydantic import BaseModel
from typing import List


class A(BaseModel):
    x: int
    y: int
    
    class Config:
        allow_mutation = False


@dataclass(frozen=True)
class B:
    x: int
    y: int
 
hash(A(x=1, y=2)) == hash(B(x=1, y=2))
>>> True

hash(A(x=1, y=2)) == hash((1, 2))
>>> True

class C(BaseModel):
    x: int
    y: List[int]
    class Config:
        allow_mutation = False

@dataclass(frozen=True)
class D:
    x: int
    y: List[int]

hash(C(x=1, y=[1, 2, 3]))
>>>TypeError: unhashable type: 'list'

hash(D(x=1, y=[1, 2, 3]))
>>>TypeError: unhashable type: 'list'

pydantic/main.py Outdated
@@ -313,6 +313,7 @@ def __new__(mcs, name, bases, namespace, **kwargs): # noqa C901
'__schema_cache__': {},
'__json_encoder__': staticmethod(json_encoder),
'__custom_root_type__': _custom_root_type,
'__hash__': (lambda self: hash(tuple(self.__dict__.values()))) if not config.allow_mutation else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of .values we should probably have .items here. The keys ought to be considered a part of the identity of the object.

Example:

x = {"a": 1, "b": True}
y = {"d": 1, "c": True}
hash(tuple(x.values())) == hash(tuple(y.values()))  # True
hash(tuple(x.items())) == hash(tuple(y.items()))  # False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we diverge from behaviour of the built-in dataclass. But I do not know if it is a problem.
If it not a problem, then I agree with you ! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a lambda, it can be proper method on BaseModel I imagine. It could raise an error if allow_mutation is True .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a lambda, it can be proper method on BaseModel I imagine. It could raise an error if allow_mutation is True.

The 'correct' way to make an object unhashable is by setting its __hash__ attribute to None:

In [1]: class Foo:
   ...:     pass
   ...:

In [2]: class Bar:
   ...:     __hash__ = None
   ...:

In [3]: hash(Foo())
Out[3]: 284941123

In [4]: hash(Bar())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-d413494abdfb> in <module>
----> 1 hash(Bar())

TypeError: unhashable type: 'Bar'

This would have to happen on the metaclass and is done automatically by the Python interpreter for classes which implement __eq__ but not __hash__:

In [5]: from pydantic import BaseModel

In [6]: class Baz(BaseModel):
   ...:     pass
   ...:

In [7]: hash(Baz())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-67f7cb492b68> in <module>
----> 1 hash(Baz())

TypeError: unhashable type: 'Baz'

In [8]: Baz.__hash__

In [9]: class Bee:
   ...:     def __eq__(self, other):
   ...:         ...
   ...:

In [10]: Bee.__hash__

In [11]: Foo.__hash__
Out[11]: <slot wrapper '__hash__' of 'object' objects>

If __hash__ were implemented on BaseModel, we'd have to emulate the default behaviour. In addition, if __hash__ is not None, isinstance(obj, typing.Hashable) will not have the desired effect:

In [12]: from typing import Hashable

In [13]: isinstance(Bee(), Hashable)
Out[13]: False

In [14]: class FauxUnhashable:
    ...:     def __hash__(self):
    ...:         raise TypeError
    ...:

In [51]: isinstance(FauxUnhashable(), Hashable)
Out[51]: True

Reference: https://docs.python.org/3/reference/datamodel.html#object.__hash__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so it can't just be a method on BaseModel, but let's make it a proper function, not a lambda.

Copy link
Contributor

@layday layday Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of .values we should probably have .items here. The keys ought to be considered a part of the identity of the object.

I think this would be unnecessary if we include a reference to the model class.

m = TestModel()
with pytest.raises(TypeError) as exc_info:
hash(m)
assert "unhashable type: 'list'" in exc_info.value.args[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO would be nice for the exception message to be more like "unhashable type: 'TestModel' contains unhashable type: 'list' but I wouldn't say it's a strong opinion. Food for thought.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good start, but my problem here is that I imagine in the following code f and b would have the same hash and therefore be "equal"

class Foo(BaseModel):
    a: str
    b: int

    class Config:
        allow_mutation = False


class Bar(BaseModel):
    a: str
    b: int

    def dict(self, *args, **kwargs):
        d = super().dict(*args, **kwargs)
        d.pop('a')
        return d

    class Config:
        allow_mutation = False


f = Foo(a='xx', b=2)
b = Bar(a='xx', b=3)

Which is definitely not what people would expected.

I think we need to add some kind of reference to the qualname to the hash.

I think we also need to hide this behind another config option.

pydantic/main.py Outdated
@@ -313,6 +313,7 @@ def __new__(mcs, name, bases, namespace, **kwargs): # noqa C901
'__schema_cache__': {},
'__json_encoder__': staticmethod(json_encoder),
'__custom_root_type__': _custom_root_type,
'__hash__': (lambda self: hash(tuple(self.__dict__.values()))) if not config.allow_mutation else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a lambda, it can be proper method on BaseModel I imagine. It could raise an error if allow_mutation is True .

@layday
Copy link
Contributor

layday commented Oct 9, 2020

I think we need to add some kind of reference to the qualname to the hash.

Would hashing the model class not work, e.g. hash((self.__class__,) + tuple(self.__dict__.items()))? I know attrs combines the __module__ name with the __qualname__ but that might be because of something specific to attrs.

@samuelcolvin
Copy link
Member

I guess it might work, attrs might do it that way to support older versions of python.

@samuelcolvin
Copy link
Member

as long as this feature is hidden behind a 'hashable` config parameter, I'm not too bothered. We can say it's new and in "beta" or something in the docs.

@layday
Copy link
Contributor

layday commented Oct 9, 2020

hmm, would hashable = True imply allow_mutation = False or would they both need to be declared? I don't think hashable is useful by itself or something that should be exposed - hashable without allow_mutation should be disallowed, in fact. How about adding a new frozen setting which would (or could) supersede allow_mutation in 2.0? That is assuming that the reason we're not rolling this into allow_mutation is not to break compatibility in a minor release should someone out there be relying on immutable models being unhashable.

@rhuille
Copy link
Contributor Author

rhuille commented Oct 12, 2020

So 2 things to modify:

  • include the name of the class in the hash --> so we agree to diverge from the built in dataclass behavior ?

  • add a boolean parameter hashable in BaseModel config. This parameter would have the following behavior:

    • if hashable=True: implement a hash function,
      This hash function would raise an error if allow mutation=True. This is a way to "disallow hashable without allow_mutation" @layday what do you think ?
      And to overcome this error, pydantic users should implement themself a hash function in the class model.
    • if hashable=False : the hash function is None.

@layday
Copy link
Contributor

layday commented Oct 12, 2020

  • The class name is not unique. We'd have to either combine the module name with the __qualname__ or hash the class itself as suggested.
  • The hash function should not raise, a configuration error should be raised on model creation by the metaclass. My preference is for hashable not being exposed.

@samuelcolvin
Copy link
Member

How about adding a new frozen setting which would (or could) supersede allow_mutation in 2.0?

Sounds good to me.

or hash the class itself as suggested

also sounds good to me.

@samuelcolvin
Copy link
Member

@rhuille are you going to finish this? otherwise let's lose it.

@rhuille
Copy link
Contributor Author

rhuille commented Jan 4, 2021

Hi @samuelcolvin yes I am doing this now. Thanks for the specifications.

For now, `frozen` is a strict duplication of `allow_mutation` parameter
i.e. setting `frozen=True` does everything that `allow_mutation=False` does.

NB: this does not change the behavior of `allow_mutation`.

In next commit, setting `frozen=True` will also make the BaseModel hashable
while the existing behavior of `allow_mutation` will not be updated.
@rhuille rhuille force-pushed the generate-hash-for-immutable-model branch from b1a84dc to f7368ec Compare January 4, 2021 17:20
@rhuille rhuille changed the title Generate a hash function when allow_mutation is False Generate a hash function when frozen is True Jan 4, 2021
@rhuille rhuille force-pushed the generate-hash-for-immutable-model branch from f7368ec to c9098eb Compare January 4, 2021 17:29
@rhuille rhuille force-pushed the generate-hash-for-immutable-model branch 2 times, most recently from 45da2e7 to 42cd7ba Compare January 4, 2021 17:41
Now, setting `frozen=True` also generate a hash function for the model
i.e. `__hash__` is not `None`. This makes instances of the model potentially
hashable if all the attributes are hashable. (default: `False`)
@rhuille rhuille force-pushed the generate-hash-for-immutable-model branch from 42cd7ba to b4befc5 Compare January 4, 2021 17:53
@rhuille
Copy link
Contributor Author

rhuille commented Jan 4, 2021

Hi @samuelcolvin I have reworked the PR (and updated the description) to match the specifications specified by @layday and you. Thanks a lot for your feedback !

changes/1880-rhuille.md Outdated Show resolved Hide resolved
docs/usage/model_config.md Outdated Show resolved Hide resolved
docs/usage/model_config.md Outdated Show resolved Hide resolved
pydantic/main.py Outdated Show resolved Hide resolved
pydantic/main.py Outdated Show resolved Hide resolved
Copy link
Member

@PrettyWood PrettyWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good.
Do we want to tackle the case frozen=True and allow_mutation=True and raise an error in this case ?

changes/1880-rhuille.md Outdated Show resolved Hide resolved
@rhuille
Copy link
Contributor Author

rhuille commented Jan 29, 2021

Good question ! For now, if frozen=True and allow_mutation=True the behavior is: "frozen=True is applied and allow_mutation=True is implicitly ignored". I guess I could add a warning message explaining that.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise I think this looks great.

frozen = True
extra = Extra.forbid

def config_method(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this?

x: int
y: str

def method(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed too?

...


frozenmodel = FrozenModel(x=1, y='y', z='z')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't testing frozen behaviour I think, again can be removed.

Comment on lines 227 to 228
FrozenModel.from_orm({})
FrozenModel.from_orm({}) # type: ignore[pydantic-orm] # noqa F821
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, are these lines testing frozen behaviour?



class NotFrozenModel(FrozenModel):
a = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a type here to avoid extra errors in plugin-success-strict.txt


immutable = allow_mutation_ is False or frozen_ is True

if immutable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than this if clause, better to setup two separate tests.

Before: there were errors about other stuff than frozen behavior
After: The modification catch only errot related to the frozen behavior
One function tests the behavior: 'the model is mutable'
The other tests the behavior:OC 'the model is immutable'
@rhuille
Copy link
Contributor Author

rhuille commented Feb 23, 2021

Hi @samuelcolvin , thanks for your review !

Since your review:

  • I resolved the conflict with master: 4815e4c
  • Following your comments:

Also, if you want me to, I can clean the commits history.

Thanks a lot, I am very proud of this work :D

@samuelcolvin samuelcolvin merged commit d8e8e6a into pydantic:master Feb 23, 2021
@samuelcolvin
Copy link
Member

This is great, thank you so much.

@davidhyman
Copy link

Hi, I've been playing with master (at 90df33c) and am concerned that this eliminates the ability for a user-defined __hash__;

I'm all in favour of auto-generating it upon request, but I wonder if it would be less intrusive to user code to avoid setting __hash__ in frozen=False? And even if it is frozen=True, perhaps a __hash__ doesn't need generating if it's user-defined (less sure on that)?

propose changing

    '__hash__': generate_hash_function(config.frozen),

to

if config.frozen:
    new_namespace['__hash__'] = generate_hash_function()

(or some other mechanism e.g. popping again from the new namespace)

I might be missing something though - is there a suggested approach that would let me define the hash for my model? Without it, I can't put my model instances in a set, dictionary etc.

Motivating example: having a model which the generate_hash_function presently can't handle, such as nested complex generic types; or would be entirely unhashable in the usual sense due to mutable containers, but using a unique database id for reference - so as far as the application is concerned, the db ID is all that's needed for hashing?

@PrettyWood
Copy link
Member

I don't understand @davidhyman
For me the behaviour is right. When you write a custom __hash__ it actually overwrites the default __hash__ whether or not frozen is set.
Could you please share a code snippet of the problem?

@davidhyman
Copy link

@PrettyWood I think you have fixed it in #2423
and it was reported as
#2422

Thank you! (and thanks to all the other contributors)

Sorry I was late to the party. I spent a bit of time on a repro today, originally thought I couldn't reproduce it and then realised that it's fixed (at least, in 1.8.1). I don't think I'd realised it was an inheritance thing; recalled commenting out the hash assignment, so was sure it was something that had originally changed in this MR 🤔 .

import pydantic

class P(pydantic.BaseModel):
    a: int

    def __hash__(self):
        """custom hash function"""
        return self.a

class M(P):
    """our hash function is still overwritten."""

    b: dict = dict()

    class Config:
        """pydantic config."""

        allow_mutation = False
        frozen = True

m = M(a=5)
assert m.__hash__() == 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate a hash function when allow_mutation is False
7 participants