Only hash `model_fields`, not whole `dict` #7786

alexmojaki · 2023-10-09T22:12:19Z

Change Summary

This deals with just the __hash__ part of #7444, because initially I was worried that it would be hard to fix without a significant hit to performance. With this timing code:

import timeit

setup = """
import pydantic

A = pydantic.create_model("A", __config__=dict(frozen=True), **{f'x{i}': (int, i) for i in range(100)})

a = A()
"""

stmt = """
hash(a)
"""

print(timeit.timeit(setup=setup, stmt=stmt, number=1000000))

the previous approach is slightly slower for 5 fields, but about 60-70% faster for 100 fields. So this PR may slow down some programs a little, but in the larger scheme of things it's hard to imagine it making a real difference since it can still hash a million times in about 2 seconds.

Closes #7785

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

QuentinSoubeyranAqemia · 2023-10-11T09:05:12Z

Have you tried other implementation to compare performance, say a dict comprehension? I'd be curious to see how various implementation compare

alexmojaki · 2023-10-11T09:56:47Z

This:

        def hash_func(self: Any) -> int:
            d = self.__dict__
            return hash(self.__class__) + hash(tuple(d[k] for k in cls.model_fields))

is about 2 to 3 times slower than the implementation in this PR. I'm actually surprised it's not worse. We always hear about how slow Python is and how deferring to C makes a huge difference, but I often get underwhelming results when doing so.

QuentinSoubeyranAqemia · 2023-10-11T10:26:31Z

Interesting comparison !
While you're at it, what about:

def hash_func(self: Any) -> int:
            d = {self.__dict__[k] for k in cls.model_fields}
            return hash(self.__class__) + hash(tuple(d.values()))

Sometimes dict-comprehension are very fast

alexmojaki · 2023-10-11T10:50:53Z

I'll leave you to try these yourself. Also, d is a set there.

QuentinSoubeyranAqemia · 2023-10-11T12:22:34Z

I ran your benchmark and plotted the time taken as a function of the number of fields, your implementation is definitely the fastest, and is faster than the current one for very small objects.

I tweaked the implementation to call hash() only once; IIRC it is better to pass a tuple of things to hash, rather than manually summing the hases, because hash() will truncate the return value of __hash__, which can cause additional collisions.

This makes a good starting point to fix __eq__ with good performances, too

Code for the benchmark

import gc
import operator
import timeit
from typing import Type

import matplotlib.pyplot as plt
import pydantic
import tqdm.auto as tqdm


def set_hash_func_default(cls):
    pass


def set_hash_func_operator(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash((self.__class__, getter(self.__dict__)))

    cls.__hash__ = hash_func


def set_hash_func_tuple(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash((self.__class__, tuple(d[k] for k in cls.model_fields)))

    cls.__hash__ = hash_func


def set_hash_func_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        fields = {k: d[k] for k in cls.model_fields}
        return hash((self.__class__, tuple(fields.values())))

    cls.__hash__ = hash_func


setup = """
import pydantic

Model = pydantic.create_model(
    "Model",
    __config__=dict(frozen=True),
    **{f'x{i}': (int, i) for i in range(%(size)d)}
)
set_hash_func(Model)
instance = Model()
"""
statement = "hash(instance)"

sizes = list(range(100))
N_EXEC = 10_000
ys = {}
for set_hash_func in [
    set_hash_func_default,
    set_hash_func_operator,
    set_hash_func_tuple,
    set_hash_func_comprehension,
]:
    namespace = {"set_hash_func": set_hash_func}
    ys[set_hash_func.__name__] = [
        # Take the min of the repeats as recommended by timeit doc
        min(
            timeit.Timer(setup=setup % {"size": size}, stmt=statement, globals=namespace).repeat(
                repeat=5, number=N_EXEC
            )
        )
        / N_EXEC
        for size in tqdm.tqdm(sizes, desc=set_hash_func.__name__)
    ]
    gc.collect()

fig, ax = plt.subplots()

for name, times in ys.items():
    ax.plot(sizes, times, label=name)
ax.set_title("Performance benchmark for __hash__ implementations")
ax.set_xlabel("Number of fields in the pydantic model")
ax.set_ylabel("Average hash() time (s)")

plt.legend()
plt.tight_layout()
fig.savefig("results.png", dpi=300, facecolor="white", transparent=False)

alexmojaki · 2023-10-11T12:55:44Z

I think the self.__class__ part should just be removed. That would fix the main bug in #7785 about generic classes. The alternative fix would be to instead hash the same type object used in __eq__ but this would reduce hash collisions very rarely (the field values are much more likely to differ than the type) and would slow things down in most cases.

QuentinSoubeyranAqemia · 2023-10-11T13:29:44Z

That makes sense. I didn't see #7785 before. I would also be in favor of keeping __hash__ simple, so make making it a strict subset of __eq__ is simple.
As you say, adding the class name in the hash is unlikely to avoid collisions, it is rare IMHO that two classes have the exact same set of attributes and values.

dmontagu · 2023-10-12T06:16:04Z

pydantic/_internal/_model_construction.py

+        getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None
+
        def hash_func(self: Any) -> int:
-            return hash(self.__class__) + hash(tuple(self.__dict__.values()))
+            return hash(self.__class__) + hash(getter(self.__dict__))


Is this faster than just doing

def hash_func(self: Any) -> int: return hash(self.__class__) + hash(tuple(self.__dict__.get(k) for k in cls.model_fields))

I'd prefer to avoid adding the operator import if possible (as we have generally been trying to reduce the number of imports pydantic triggers, which increase startup time), but if it's noticeably faster then maybe it makes sense. I think if you can't see a significant and repeatable difference, better to not use it.

Unless I've misunderstood and the getter thing is doing something different from my suggestion.

Doh I see there was a whole conversation about this and confirmation that it is significantly faster. 👍

I still think we should confirm with @samuelcolvin though given the point about wanting to reduce import times.

I see you propose using .get() instead of []. Do you know whether it is safe to use itemgetter(), which doesn't have a default and thus will raise an error if the key is not found?

Said differently, are there cases where some keys from cls.model_fields are not present on the instance?

Said differently, are there cases where some keys from cls.model_fields are not present on the instance?

I tried to find ways that this might happen, which led me to #7784. I couldn't think of anything else but I don't actually know pydantic that deeply so I could easily be missing something. But my guess is that if it's possible then it's enough of a red flag that raising an error here would actually be a good thing.

QuentinSoubeyranAqemia · 2023-10-12T07:49:09Z

Regarding pydantic load time, it seems pydantic already imports operator at load time (possibly transitively):
python -c 'import sys, pydantic; print("operator" in sys.modules.keys())'
prints True

It could be that sys itself depends on operator, since python -c 'import sy; print("operator" in sys.modules.keys())' also yields true. But the interpreter also loads some modules by default when it starts, so it's a bit hard to know

QuentinSoubeyranAqemia · 2023-10-12T08:06:55Z

I performed a timing benchmark, I can't measure any difference in importing operator and pydantic vs. importing only pydantic.

Time pydantic alone:

for i in {1..100}; do { time -p python -c "import pydantic"; } 2>> import_time.txt; done
python -c 'import pathlib; print(min(float(line.split(" ")[1]) for line in pathlib.Path("import_time.txt").read_text().split("\n") if line.startswith("real"
)))'

Time pydantic + opeator:

for i in {1..100}; do { time -p python -c "import pydantic, operator"; } 2>> import_time_operator.txt; done
python -c 'import pathlib; print(min(float(line.split(" ")[1]) for line in pathlib.Path("import_time_operator.txt").read_text().split("\n") if line.startswith("real")))'

QuentinSoubeyranAqemia · 2023-10-12T09:33:31Z

I did further benchmark in case we need to handle some keys in cls.model_fields not being reflected on the instance __dict__.
I updated the benchmark above to report performance normalized by the current implementation, and commented out the comprehensions which are very slow.

Interestingly, performance of the stdlib varies with python version: on python 3.8, only wrapping __dict__ in a collections.defaultdict if some keys are missing doesn't improve performance. But on python 3.12, it degrades performances quite a bit for large models, but improves it for small models.

I guess we could implement a switch based on the size of the __dict__ to unconditionnally wrap above a certain size.

Python 3.8 runs

No injection of additional value in __dict__

Injection of additional value in `__dict__` (e.g. by `functools.cached_property`)

Python 3.12 runs

No injection of additional value in __dict__

Injection of additional value in `__dict__` (e.g. by `functools.cached_property`)

Benchmark code

import collections
import dataclasses
import gc
import operator
import sys
import timeit
from typing import Any, Callable, Dict, Set, Tuple, Type

import matplotlib.pyplot as plt
import numpy as np
import pydantic
import tqdm.auto as tqdm

PYTHON_VERSION = ".".join(map(str, sys.version_info))


def safe_itemgetter(*items) -> Callable[[Dict[str, Any]], Tuple[Any, ...]]:
    if not items:
        return lambda _: tuple()

    key_set = frozenset(items)
    getter = operator.itemgetter(*items)

    def safe_wrapper(d: Dict[str, Any]) -> Tuple[Any, ...]:
        return (
            getter(d) if d.keys() >= key_set else getter(collections.defaultdict(lambda: None, d))
        )

    return safe_wrapper


def noop(*args, **kwargs):
    pass


def set_hash_func_itemgetter(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(self.__dict__))

    cls.__hash__ = hash_func


def set_hash_func_itemgetter_defaultdict(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(collections.defaultdict(lambda: None, self.__dict__)))

    cls.__hash__ = hash_func


def set_hash_func_itemgetter_safe(cls: Type[pydantic.BaseModel]):
    getter = safe_itemgetter(*cls.model_fields.keys())

    def hash_func(self) -> int:
        return hash(getter(self.__dict__))

    cls.__hash__ = hash_func


def set_hash_func_tuple_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d[k] for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_get_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d.get(k) for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_dict_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        fields = {k: d[k] for k in cls.model_fields}
        return hash(tuple(fields.values()))

    cls.__hash__ = hash_func


SETUP = """
import pydantic

Model = pydantic.create_model(
    "Model",
    __config__=dict(frozen=True),
    **{f'x{i}': (int, i) for i in range(%(size)d)}
)
set_hash_func(Model)
instance = Model()
# Mimic @functools.cached_property editing __dict__
# instance.__dict__["not_a_field"] = None
"""
STATEMENT = "hash(instance)"


def time_hash_func(
    set_hash_func,
    *,
    sizes,
    n_execution: int,
    n_repeat: int = 5,
    progress_bar: bool = True,
) -> np.ndarray:
    namespace = {"set_hash_func": set_hash_func}
    times = [
        # Take the min of the repeats as recommended by timeit doc
        min(
            timeit.Timer(
                setup=SETUP % {"size": size},
                stmt=STATEMENT,
                globals=namespace,
            ).repeat(repeat=n_repeat, number=n_execution)
        )
        / n_execution
        for size in tqdm.tqdm(sizes, desc=set_hash_func.__name__, disable=not progress_bar)
    ]
    gc.collect()
    return np.array(times, dtype=float)


N_EXEC = 10_000
fig, ax = plt.subplots()

sizes = list(range(100))
baseline = time_hash_func(noop, sizes=sizes, n_execution=N_EXEC)

# Plot the baseline
ax.plot(sizes, baseline / baseline, label="baseline")

for label, set_hash_func in {
    "itemgetter": set_hash_func_itemgetter,
    "itemgetter+defaultdict": set_hash_func_itemgetter_defaultdict,
    "itemgetter+safety": set_hash_func_itemgetter_safe,
    # Those have bad performances and are deactivated to speed-up experimentation
    # "tuple-comprehension": set_hash_func_tuple_comprehension,
    # "get-comprehension": set_hash_func_get_comprehension,
    # "dict-comprehension": set_hash_func_dict_comprehension,
}.items():
    times = time_hash_func(set_hash_func, sizes=sizes, n_execution=N_EXEC)
    ax.plot(sizes, times / baseline, label=label)

ax.set_title(f"__hash__ timing benchmark (python {PYTHON_VERSION})")
ax.set_xlabel("Number of fields in the pydantic model")
ax.set_ylabel("Average time relative to baseline")

plt.legend()
plt.tight_layout()
fig.savefig(
    f"hash-benchmark_nocache_python-{PYTHON_VERSION}.png",
    dpi=300,
    facecolor="white",
    transparent=False,
)

alexmojaki · 2023-10-12T09:51:11Z

Thanks @QuentinSoubeyranAqemia for all your analysis.

Regarding importing operator, in addition to the points above, there's already a from operator import attrgetter in _generate_schema.py.

QuentinSoubeyranAqemia · 2023-10-12T09:59:16Z

I tried the following code, inspired by dataclasses, to try to get an instance that is missing a field. It doesn't seems to work, so I hope that pydantic enforces all fields to be present on the instance?

class Model(pydantic.BaseModel, frozen=True):
    attr: int = pydantic.Field(init_var=False)

m = Model()  # ValidationError

QuentinSoubeyranAqemia · 2023-10-12T10:00:37Z

@alexmojaki I think we still need to remove __class__ from the hash in this PR? Might as well fix #7785 which you opened while we are modifying __hash__.

alexmojaki · 2023-10-12T10:07:20Z

@alexmojaki I think we still need to remove __class__ from the hash in this PR? Might as well fix #7785 which you opened while we are modifying __hash__.

I agree, but that issue is still in pending state. @dmontagu WDYT, can I remove hash(self.__class__) + in this PR? See #7786 (comment) which is a bit buried in the conversation now.

QuentinSoubeyranAqemia · 2023-10-14T13:50:01Z

Maybe we could add a test in tests/test_edge_cases.py ? Looks like a good place to test for that

QuentinSoubeyranAqemia · 2023-10-14T20:17:58Z

@alexmojaki Just FYI, #7825 revealed an edge-case where you can have an instance that is missing some fields. This impacts your use of operator.itemgetter (I used it as well and that's why the problem showed up)

alexmojaki · 2023-10-15T12:18:45Z

Thanks @QuentinSoubeyranAqemia. That means this PR should also fix another bug, since the current implementation assumes that the order of __dict__ is reliable:

from pydantic import BaseModel


class A(BaseModel, frozen=True):
    x: int
    y: int


a = A(x=1, y=2)
a2 = a.copy(exclude={"x"}).copy(update={"x": 1})
print(a.__dict__)   # {'x': 1, 'y': 2}
print(a2.__dict__)  # {'y': 2, 'x': 1}
assert a == a2
assert hash(a) != hash(a2)

I've pushed a fix with the fastest overall approach I could find, and I expect it can be reused for __eq__.

While I'm at it I removed the hash(self.__class__). However this causes the test test_hash_function_give_different_result_for_different_object to fail, which expects identical-looking models with different classes to have different hashes. Going back to when this was all added, I found this comment providing some motivation:

my problem here is that I imagine in the following code f and b would have the same hash and therefore be "equal" ... Which is definitely not what people would expected.

I think this is a mistake, and that it's OK for the hashes to match if only the classes differ.

…k in test_hash_function_give_different_result_for_different_object for differing classes

QuentinSoubeyranAqemia · 2023-10-16T09:40:20Z

I'll leverage the FallbackDict in the __eq__ implementation, thanks!

Getting models with missing fields is very weird, it violates the Liskov substitution principle. Interestingly, the documented alternative to the deprecated BaseModel.copy() method raise a ValidationError when excluding required fields.

For completeness sake's, I ran the benchmarks again with your safety mecanism: it has negligible performance impact on python 3.8, and since python 3.11 got zero-cost exception handling, impact is non-existent in python 3.11+.
It is faster than using collections.defaultdict unconditionally, which is nice, too.

Benchmark code (clock to reveal)

import collections
import gc
import itertools
import operator
import sys
import textwrap
import timeit
from os import name
from typing import Any, Dict, Iterable, List, Optional, Sized, Type

import matplotlib.pyplot as plt
import numpy as np
import pydantic
import tqdm.auto as tqdm
from matplotlib import axes, figure

PYTHON_VERSION = ".".join(map(str, sys.version_info))


def noop(*args, **kwargs):
    pass


def set_hash_func_itemgetter(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(self.__dict__))

    cls.__hash__ = hash_func


def set_hash_func_itemgetter_defaultdict(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(collections.defaultdict(lambda: None, self.__dict__)))

    cls.__hash__ = hash_func


class FallbackDict:
    def __init__(self, inner):
        self.inner = inner

    def __getitem__(self, key):
        return self.inner.get(key)


def set_hash_func_itemgetter_safe(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: 0

    def hash_func(self: Any) -> int:
        try:
            return hash(getter(self.__dict__))
        except KeyError:
            # In rare cases (such as when using the deprecated copy method), the __dict__ may not contain
            # all model fields, which is how we can get here.
            # getter(self.__dict__) is much faster than any 'safe' method that accounts for missing keys,
            # and wrapping it in a `try` doesn't slow things down much in the common case.
            return hash(getter(FallbackDict(self.__dict__)))  # type: ignore

    cls.__hash__ = hash_func


def set_hash_func_tuple_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d[k] for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_get_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d.get(k) for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_dict_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        fields = {k: d[k] for k in cls.model_fields}
        return hash(tuple(fields.values()))

    cls.__hash__ = hash_func


class SizedIterable(Sized, Iterable):
    pass


def run_benchmark(
    title: str,
    setup: str,
    statement: str,
    n_execution: int = 10_000,
    n_repeat: int = 5,
    globals: Optional[Dict[str, Any]] = None,
    progress_bar: bool = True,
    params: Optional[Dict[str, SizedIterable]] = None,
) -> np.ndarray:
    namespace = globals or {}
    # fast-path
    if not params:
        length = 1
        packed_params = [()]
    else:
        length = len(next(iter(params.values())))
        # This iterator yields a tuple of (key, value) pairs
        # First, make a list of N iterator over (key, value), where the provided values are iterated
        param_pairs = [zip(itertools.repeat(name), value) for name, value in params.items()]
        # Then pack our individual parameter iterator into one
        packed_params = zip(*param_pairs)

    times = [
        # Take the min of the repeats as recommended by timeit doc
        min(
            timeit.Timer(
                setup=setup % dict(local_params),
                stmt=statement,
                globals=namespace,
            ).repeat(repeat=n_repeat, number=n_execution)
        )
        / n_execution
        for local_params in tqdm.tqdm(
            packed_params, desc=title, total=length, disable=not progress_bar
        )
    ]
    gc.collect()

    return np.asarray(times, dtype=float)


def run_hash_benchmark(
    title: str,
    set_hash_func,
    sizes: SizedIterable,
    mimic_cached_property: bool,
    n_execution: int = 10_000,
    n_repeat: int = 5,
) -> np.ndarray:
    setup = textwrap.dedent(
        f"""
        import pydantic

        Model = pydantic.create_model(
            "Model",
            __config__=dict(frozen=True),
            **{{f'x{{i}}': (int, i) for i in range(%(size)d)}}
        )
        set_hash_func(Model)
        instance = Model()
        # Mimic @functools.cached_property editing __dict__
        {"instance.__dict__['not_a_field'] = None" if mimic_cached_property else ''}
        """
    )
    statement = "hash(instance)"
    namespace = {"set_hash_func": set_hash_func}
    return run_benchmark(
        title=title,
        setup=setup,
        statement=statement,
        n_execution=n_execution,
        n_repeat=n_repeat,
        globals=namespace,
        params={"size": sizes},
    )


def plot_benchmark(
    title: str,
    sizes: List[int],
    mimic_cached_property: bool,
    ax: Optional[axes.Axes] = None,
):
    ax = ax or plt.gca()

    baseline = run_hash_benchmark(
        title=f'{title}, baseline',
        set_hash_func=noop,
        sizes=sizes,
        mimic_cached_property=mimic_cached_property,
    )
    ax.plot(sizes, baseline / baseline, label='baseline')
    for label, set_hash_func in {
        "itemgetter": set_hash_func_itemgetter,
        "itemgetter+defaultdict": set_hash_func_itemgetter_defaultdict,
        "itemgetter+safety": set_hash_func_itemgetter_safe,
        # Those have bad performances and are deactivated to speed-up experimentation
        # "tuple-comprehension": set_hash_func_tuple_comprehension,
        # "get-comprehension": set_hash_func_get_comprehension,
        # "dict-comprehension": set_hash_func_dict_comprehension,
    }.items():
        times = run_hash_benchmark(
            title=f'{title}, {label}',
            set_hash_func=set_hash_func,
            sizes=sizes,
            mimic_cached_property=mimic_cached_property,
        )
        ax.plot(sizes, times / baseline, label=label)

    ax.set_title(title)
    ax.set_xlabel('Number of pydantic fields')
    ax.set_ylabel('Average time relative to baseline')
    return ax


def plot_all_benchmarks(sizes: List[int]) -> figure.Figure:
    n_rows, n_cols = 1, 2
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 4))

    for col, mimic_cached_property in enumerate([False, True]):
        plot_benchmark(
            title=f'{mimic_cached_property=}',
            sizes=sizes,
            mimic_cached_property=mimic_cached_property,
            ax=axes[col],
        )

    for ax in axes.ravel():
        ax.legend()
    fig.suptitle(f"__hash__ timing benchmark (python {PYTHON_VERSION})")
    return fig


if __name__ == "__main__":
    fig = plot_all_benchmarks(list(range(100)))
    plt.tight_layout()
    target = f"hash-benchmark_python-{PYTHON_VERSION}.png"
    fig.savefig(
        target,
        dpi=300,
        facecolor="white",
        transparent=False,
    )
    print(f"Wrote {target}", file=sys.stderr)

dmontagu · 2023-11-10T18:51:47Z

@alexmojaki regarding the removal of the class in the hash — I think it might make more sense for it to grab the generic origin and include that in the hash (same as is done by the equality method). That would at least keep different classes with the same data dicts having different hashes, presumably reducing collisions. But obviously that's already going to be rare so maybe it is better to not worry about it and let the equality check differentiate. (And I guess it will make the common case of having different data a bit faster anyway due to eliminating the hashing of the type?) I'm happy to defer to you.

dmontagu · 2023-11-10T19:00:30Z

(@alexmojaki I'll wait to actually merge until you give a reply to the last comment about how you want to proceed, but to be clear I'm okay merging this as is if you think that's best.)

alexmojaki · 2023-11-10T19:01:15Z

I think it'd be so weird to have a dict/set containing a large number of models of different types (already an oddity in normal code) with the same data that it's worth optimising the common case.

sydney-runkle · 2023-11-10T19:40:25Z

LGTM overall, but I'd prefer to wait and merge once we release 2.5.0 - I don't think it makes sense to squeeze in changes between 2.5.0b1 and now. Happy to merge this mid-next week when we get the next release out.

Thanks for your work on this @alexmojaki!

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [pydantic](https://github.com/pydantic/pydantic) ([changelog](https://docs.pydantic.dev/latest/changelog/)) | dependencies | minor | `2.5.3` -> `2.6.1` | --- ### Release Notes <details> <summary>pydantic/pydantic (pydantic)</summary> ### [`v2.6.1`](https://github.com/pydantic/pydantic/blob/HEAD/HISTORY.md#v261-2024-02-05) [Compare Source](pydantic/pydantic@v2.6.0...v2.6.1) [GitHub release](https://github.com/pydantic/pydantic/releases/tag/v2.6.1) ##### What's Changed ##### Packaging - Upgrade to `pydantic-core` 2.16.2 by [@sydney-runkle](https://github.com/sydney-runkle) in [#8717](pydantic/pydantic#8717) ##### Fixes - Fix bug with `mypy` plugin and `no_strict_optional = True` by [@dmontagu](https://github.com/dmontagu) in [#8666](pydantic/pydantic#8666) - Fix `ByteSize` error `type` change by [@sydney-runkle](https://github.com/sydney-runkle) in [#8681](pydantic/pydantic#8681) - Fix inheriting `Field` annotations in dataclasses by [@sydney-runkle](https://github.com/sydney-runkle) in [#8679](pydantic/pydantic#8679) - Fix regression in core schema generation for indirect definition references by [@dmontagu](https://github.com/dmontagu) in [#8702](pydantic/pydantic#8702) - Fix unsupported types bug with `PlainValidator` by [@sydney-runkle](https://github.com/sydney-runkle) in [#8710](pydantic/pydantic#8710) - Reverting problematic fix from 2.6 release, fixing schema building bug by [@sydney-runkle](https://github.com/sydney-runkle) in [#8718](pydantic/pydantic#8718) - Fix warning for tuple of wrong size in `Union` by [@davidhewitt](https://github.com/davidhewitt) in [pydantic/pydantic-core#1174](pydantic/pydantic-core#1174) - Fix `computed_field` JSON serializer `exclude_none` behavior by [@sydney-runkle](https://github.com/sydney-runkle) in [pydantic/pydantic-core#1187](pydantic/pydantic-core#1187) ### [`v2.6.0`](https://github.com/pydantic/pydantic/blob/HEAD/HISTORY.md#v260-2024-01-23) [Compare Source](pydantic/pydantic@v2.5.3...v2.6.0) [GitHub release](https://github.com/pydantic/pydantic/releases/tag/v2.6.0) The code released in v2.6.0 is practically identical to that of v2.6.0b1. ##### What's Changed ##### Packaging - Check for `email-validator` version >= 2.0 by [@commonism](https://github.com/commonism) in [#6033](pydantic/pydantic#6033) - Upgrade \`ruff\`\` target version to Python 3.8 by [@Elkiwa](https://github.com/Elkiwa) in [#8341](pydantic/pydantic#8341) - Update to `pydantic-extra-types==2.4.1` by [@yezz123](https://github.com/yezz123) in [#8478](pydantic/pydantic#8478) - Update to `pyright==1.1.345` by [@Viicos](https://github.com/Viicos) in [#8453](pydantic/pydantic#8453) - Update pydantic-core from 2.14.6 to 2.16.1, significant changes from these updates are described below, full changelog [here](pydantic/pydantic-core@v2.14.6...v2.16.1) ##### New Features - Add `NatsDsn` by [@ekeew](https://github.com/ekeew) in [#6874](pydantic/pydantic#6874) - Add `ConfigDict.ser_json_inf_nan` by [@davidhewitt](https://github.com/davidhewitt) in [#8159](pydantic/pydantic#8159) - Add `types.OnErrorOmit` by [@adriangb](https://github.com/adriangb) in [#8222](pydantic/pydantic#8222) - Support `AliasGenerator` usage by [@sydney-runkle](https://github.com/sydney-runkle) in [#8282](pydantic/pydantic#8282) - Add Pydantic People Page to docs by [@sydney-runkle](https://github.com/sydney-runkle) in [#8345](pydantic/pydantic#8345) - Support `yyyy-MM-DD` datetime parsing by [@sydney-runkle](https://github.com/sydney-runkle) in [#8404](pydantic/pydantic#8404) - Added bits conversions to the `ByteSize` class [#8415](pydantic/pydantic#8415) by [@luca-matei](https://github.com/luca-matei) in [#8507](pydantic/pydantic#8507) - Enable json schema creation with type `ByteSize` by [@geospackle](https://github.com/geospackle) in [#8537](pydantic/pydantic#8537) - Add `eval_type_backport` to handle union operator and builtin generic subscripting in older Pythons by [@alexmojaki](https://github.com/alexmojaki) in [#8209](pydantic/pydantic#8209) - Add support for `dataclass` fields `init` by [@dmontagu](https://github.com/dmontagu) in [#8552](pydantic/pydantic#8552) - Implement pickling for `ValidationError` by [@davidhewitt](https://github.com/davidhewitt) in [pydantic/pydantic-core#1119](pydantic/pydantic-core#1119) - Add unified tuple validator that can handle "variadic" tuples via PEP-646 by [@dmontagu](https://github.com/dmontagu) in [pydantic/pydantic-core#865](pydantic/pydantic-core#865) ##### Changes - Drop Python3.7 support by [@hramezani](https://github.com/hramezani) in [#7188](pydantic/pydantic#7188) - Drop Python 3.7, and PyPy 3.7 and 3.8 by [@davidhewitt](https://github.com/davidhewitt) in [pydantic/pydantic-core#1129](pydantic/pydantic-core#1129) - Use positional-only `self` in `BaseModel` constructor, so no field name can ever conflict with it by [@ariebovenberg](https://github.com/ariebovenberg) in [#8072](pydantic/pydantic#8072) - Make `@validate_call` return a function instead of a custom descriptor - fixes binding issue with inheritance and adds `self/cls` argument to validation errors by [@alexmojaki](https://github.com/alexmojaki) in [#8268](pydantic/pydantic#8268) - Exclude `BaseModel` docstring from JSON schema description by [@sydney-runkle](https://github.com/sydney-runkle) in [#8352](pydantic/pydantic#8352) - Introducing `classproperty` decorator for `model_computed_fields` by [@Jocelyn-Gas](https://github.com/Jocelyn-Gas) in [#8437](pydantic/pydantic#8437) - Explicitly raise an error if field names clashes with types by [@Viicos](https://github.com/Viicos) in [#8243](pydantic/pydantic#8243) - Use stricter serializer for unions of simple types by [@alexdrydew](https://github.com/alexdrydew) [pydantic/pydantic-core#1132](pydantic/pydantic-core#1132) ##### Performance - Add Codspeed profiling Actions workflow by [@lambertsbennett](https://github.com/lambertsbennett) in [#8054](pydantic/pydantic#8054) - Improve `int` extraction by [@samuelcolvin](https://github.com/samuelcolvin) in [pydantic/pydantic-core#1155](pydantic/pydantic-core#1155) - Improve performance of recursion guard by [@samuelcolvin](https://github.com/samuelcolvin) in [pydantic/pydantic-core#1156](pydantic/pydantic-core#1156) - `dataclass` serialization speedups by [@samuelcolvin](https://github.com/samuelcolvin) in [pydantic/pydantic-core#1162](pydantic/pydantic-core#1162) - Avoid `HashMap` creation when looking up small JSON objects in `LazyIndexMaps` by [@samuelcolvin](https://github.com/samuelcolvin) in [pydantic/jiter#55](pydantic/jiter#55) - use hashbrown to speedup python string caching by [@davidhewitt](https://github.com/davidhewitt) in [pydantic/jiter#51](pydantic/jiter#51) - Replace `Peak` with more efficient `Peek` by [@davidhewitt](https://github.com/davidhewitt) in [pydantic/jiter#48](pydantic/jiter#48) ##### Fixes - Move `getattr` warning in deprecated `BaseConfig` by [@tlambert03](https://github.com/tlambert03) in [#7183](pydantic/pydantic#7183) - Only hash `model_fields`, not whole `__dict__` by [@alexmojaki](https://github.com/alexmojaki) in [#7786](pydantic/pydantic#7786) - Fix mishandling of unions while freezing types in the `mypy` plugin by [@dmontagu](https://github.com/dmontagu) in [#7411](pydantic/pydantic#7411) - Fix `mypy` error on untyped `ClassVar` by [@vincent-hachin-wmx](https://github.com/vincent-hachin-wmx) in [#8138](pydantic/pydantic#8138) - Only compare pydantic fields in `BaseModel.__eq__` instead of whole `__dict__` by [@QuentinSoubeyranAqemia](https://github.com/QuentinSoubeyranAqemia) in [#7825](pydantic/pydantic#7825) - Update `strict` docstring in `model_validate` method. by [@LukeTonin](https://github.com/LukeTonin) in [#8223](pydantic/pydantic#8223) - Fix overload position of `computed_field` by [@Viicos](https://github.com/Viicos) in [#8227](pydantic/pydantic#8227) - Fix custom type type casting used in multiple attributes by [@ianhfc](https://github.com/ianhfc) in [#8066](pydantic/pydantic#8066) - Fix issue not allowing `validate_call` decorator to be dynamically assigned to a class method by [@jusexton](https://github.com/jusexton) in [#8249](pydantic/pydantic#8249) - Fix issue `unittest.mock` deprecation warnings by [@ibleedicare](https://github.com/ibleedicare) in [#8262](pydantic/pydantic#8262) - Added tests for the case `JsonValue` contains subclassed primitive values by [@jusexton](https://github.com/jusexton) in [#8286](pydantic/pydantic#8286) - Fix `mypy` error on free before validator (classmethod) by [@sydney-runkle](https://github.com/sydney-runkle) in [#8285](pydantic/pydantic#8285) - Fix `to_snake` conversion by [@jevins09](https://github.com/jevins09) in [#8316](pydantic/pydantic#8316) - Fix type annotation of `ModelMetaclass.__prepare__` by [@slanzmich](https://github.com/slanzmich) in [#8305](pydantic/pydantic#8305) - Disallow `config` specification when initializing a `TypeAdapter` when the annotated type has config already by [@sydney-runkle](https://github.com/sydney-runkle) in [#8365](pydantic/pydantic#8365) - Fix a naming issue with JSON schema for generics parametrized by recursive type aliases by [@dmontagu](https://github.com/dmontagu) in [#8389](pydantic/pydantic#8389) - Fix type annotation in pydantic people script by [@shenxiangzhuang](https://github.com/shenxiangzhuang) in [#8402](pydantic/pydantic#8402) - Add support for field `alias` in `dataclass` signature by [@NeevCohen](https://github.com/NeevCohen) in [#8387](pydantic/pydantic#8387) - Fix bug with schema generation with `Field(...)` in a forward ref by [@dmontagu](https://github.com/dmontagu) in [#8494](pydantic/pydantic#8494) - Fix ordering of keys in `__dict__` with `model_construct` call by [@sydney-runkle](https://github.com/sydney-runkle) in [#8500](pydantic/pydantic#8500) - Fix module `path_type` creation when globals does not contain `__name__` by [@hramezani](https://github.com/hramezani) in [#8470](pydantic/pydantic#8470) - Fix for namespace issue with dataclasses with `from __future__ import annotations` by [@sydney-runkle](https://github.com/sydney-runkle) in [#8513](pydantic/pydantic#8513) - Fix: make function validator types positional-only by [@pmmmwh](https://github.com/pmmmwh) in [#8479](pydantic/pydantic#8479) - Fix usage of `@deprecated` by [@Viicos](https://github.com/Viicos) in [#8294](pydantic/pydantic#8294) - Add more support for private attributes in `model_construct` call by [@sydney-runkle](https://github.com/sydney-runkle) in [#8525](pydantic/pydantic#8525) - Use a stack for the types namespace by [@dmontagu](https://github.com/dmontagu) in [#8378](pydantic/pydantic#8378) - Fix schema-building bug with `TypeAliasType` for types with refs by [@dmontagu](https://github.com/dmontagu) in [#8526](pydantic/pydantic#8526) - Support `pydantic.Field(repr=False)` in dataclasses by [@tigeryy2](https://github.com/tigeryy2) in [#8511](pydantic/pydantic#8511) - Override `dataclass_transform` behavior for `RootModel` by [@Viicos](https://github.com/Viicos) in [#8163](pydantic/pydantic#8163) - Refactor signature generation for simplicity by [@sydney-runkle](https://github.com/sydney-runkle) in [#8572](pydantic/pydantic#8572) - Fix ordering bug of PlainValidator annotation by [@Anvil](https://github.com/Anvil) in [#8567](pydantic/pydantic#8567) - Fix `exclude_none` for json serialization of `computed_field`s by [@sydney-runkle](https://github.com/sydney-runkle) in [pydantic/pydantic-core#1098](pydantic/pydantic-core#1098) - Support yyyy-MM-DD string for datetimes by [@sydney-runkle](https://github.com/sydney-runkle) in [pydantic/pydantic-core#1124](pydantic/pydantic-core#1124) - Tweak ordering of definitions in generated schemas by [@StrawHatDrag0n](https://github.com/StrawHatDrag0n) in [#8583](pydantic/pydantic#8583) ##### New Contributors ##### `pydantic` - [@ekeew](https://github.com/ekeew) made their first contribution in [#6874](pydantic/pydantic#6874) - [@lambertsbennett](https://github.com/lambertsbennett) made their first contribution in [#8054](pydantic/pydantic#8054) - [@vincent-hachin-wmx](https://github.com/vincent-hachin-wmx) made their first contribution in [#8138](pydantic/pydantic#8138) - [@QuentinSoubeyranAqemia](https://github.com/QuentinSoubeyranAqemia) made their first contribution in [#7825](pydantic/pydantic#7825) - [@ariebovenberg](https://github.com/ariebovenberg) made their first contribution in [#8072](pydantic/pydantic#8072) - [@LukeTonin](https://github.com/LukeTonin) made their first contribution in [#8223](pydantic/pydantic#8223) - [@denisart](https://github.com/denisart) made their first contribution in [#8231](pydantic/pydantic#8231) - [@ianhfc](https://github.com/ianhfc) made their first contribution in [#8066](pydantic/pydantic#8066) - [@eonu](https://github.com/eonu) made their first contribution in [#8255](pydantic/pydantic#8255) - [@amandahla](https://github.com/amandahla) made their first contribution in [#8263](pydantic/pydantic#8263) - [@ibleedicare](https://github.com/ibleedicare) made their first contribution in [#8262](pydantic/pydantic#8262) - [@jevins09](https://github.com/jevins09) made their first contribution in [#8316](pydantic/pydantic#8316) - [@cuu508](https://github.com/cuu508) made their first contribution in [#8322](pydantic/pydantic#8322) - [@slanzmich](https://github.com/slanzmich) made their first contribution in [#8305](pydantic/pydantic#8305) - [@jensenbox](https://github.com/jensenbox) made their first contribution in [#8331](pydantic/pydantic#8331) - [@szepeviktor](https://github.com/szepeviktor) made their first contribution in [#8356](pydantic/pydantic#8356) - [@Elkiwa](https://github.com/Elkiwa) made their first contribution in [#8341](pydantic/pydantic#8341) - [@parhamfh](https://github.com/parhamfh) made their first contribution in [#8395](pydantic/pydantic#8395) - [@shenxiangzhuang](https://github.com/shenxiangzhuang) made their first contribution in [#8402](pydantic/pydantic#8402) - [@NeevCohen](https://github.com/NeevCohen) made their first contribution in [#8387](pydantic/pydantic#8387) - [@zby](https://github.com/zby) made their first contribution in [#8497](pydantic/pydantic#8497) - [@patelnets](https://github.com/patelnets) made their first contribution in [#8491](pydantic/pydantic#8491) - [@edwardwli](https://github.com/edwardwli) made their first contribution in [#8503](pydantic/pydantic#8503) - [@luca-matei](https://github.com/luca-matei) made their first contribution in [#8507](pydantic/pydantic#8507) - [@Jocelyn-Gas](https://github.com/Jocelyn-Gas) made their first contribution in [#8437](pydantic/pydantic#8437) - [@bL34cHig0](https://github.com/bL34cHig0) made their first contribution in [#8501](pydantic/pydantic#8501) - [@tigeryy2](https://github.com/tigeryy2) made their first contribution in [#8511](pydantic/pydantic#8511) - [@geospackle](https://github.com/geospackle) made their first contribution in [#8537](pydantic/pydantic#8537) - [@Anvil](https://github.com/Anvil) made their first contribution in [#8567](pydantic/pydantic#8567) - [@hungtsetse](https://github.com/hungtsetse) made their first contribution in [#8546](pydantic/pydantic#8546) - [@StrawHatDrag0n](https://github.com/StrawHatDrag0n) made their first contribution in [#8583](pydantic/pydantic#8583) ##### `pydantic-core` - [@mariuswinger](https://github.com/mariuswinger) made their first contribution in [pydantic/pydantic-core#1087](pydantic/pydantic-core#1087) - [@adamchainz](https://github.com/adamchainz) made their first contribution in [pydantic/pydantic-core#1090](pydantic/pydantic-core#1090) - [@akx](https://github.com/akx) made their first contribution in [pydantic/pydantic-core#1123](pydantic/pydantic-core#1123) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.apud.pl/jacek/adventofcode/pulls/57 Co-authored-by: Renovate <renovate@apud.pl> Co-committed-by: Renovate <renovate@apud.pl>

Only hash model_fields, not whole __dict__

5166775

alexmojaki mentioned this pull request Oct 10, 2023

Pydantic __eq__ and __hash__ doesn't work properly with functools.cached_property #7444

Closed

1 task

dmontagu reviewed Oct 12, 2023

View reviewed changes

dmontagu added the relnotes-fix Used for bugfixes. label Oct 12, 2023

QuentinSoubeyranAqemia mentioned this pull request Oct 14, 2023

Only compare pydantic fields in BaseModel.__eq__ instead of whole __dict__ #7825

Merged

5 tasks

Account for fields missing from __dict__

1bb2069

alexmojaki added 4 commits October 15, 2023 15:04

Add test_hash_function_works_when_instance_dict_modified, remove chec…

1e2d7b1

…k in test_hash_function_give_different_result_for_different_object for differing classes

Generate a new hash function when subclassing another frozen model

c929e06

Test hashing in test_model_equality_generics

802ceea

test_hash_function_empty_model

b66e511

alexmojaki requested a review from dmontagu October 15, 2023 13:29

dmontagu approved these changes Nov 10, 2023

View reviewed changes

sydney-runkle merged commit f23578e into pydantic:main Nov 13, 2023
59 checks passed

samuelcolvin changed the title ~~Only hash model_fields, not whole __dict__~~ Only hash model_fields, not whole __dict__ Nov 15, 2023

QuentinSoubeyranAqemia added a commit to QuentinSoubeyranAqemia/pydantic that referenced this pull request Nov 15, 2023

chore: share implementation with pydantic#7786

c756c1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only hash `model_fields`, not whole `dict` #7786

Only hash `model_fields`, not whole `dict` #7786

alexmojaki commented Oct 9, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 11, 2023

alexmojaki commented Oct 11, 2023

QuentinSoubeyranAqemia commented Oct 11, 2023

alexmojaki commented Oct 11, 2023

QuentinSoubeyranAqemia commented Oct 11, 2023

alexmojaki commented Oct 11, 2023

QuentinSoubeyranAqemia commented Oct 11, 2023

dmontagu Oct 12, 2023 •

edited

dmontagu Oct 12, 2023

dmontagu Oct 12, 2023

QuentinSoubeyranAqemia Oct 12, 2023

alexmojaki Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 12, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 12, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 12, 2023

alexmojaki commented Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 12, 2023

alexmojaki commented Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 14, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 14, 2023

alexmojaki commented Oct 15, 2023

QuentinSoubeyranAqemia commented Oct 16, 2023

dmontagu commented Nov 10, 2023 •

edited

dmontagu commented Nov 10, 2023

alexmojaki commented Nov 10, 2023

sydney-runkle commented Nov 10, 2023

Only hash model_fields, not whole __dict__ #7786

Only hash model_fields, not whole __dict__ #7786

Conversation

alexmojaki commented Oct 9, 2023 • edited

Change Summary

Checklist

QuentinSoubeyranAqemia commented Oct 11, 2023

alexmojaki commented Oct 11, 2023

QuentinSoubeyranAqemia commented Oct 11, 2023

alexmojaki commented Oct 11, 2023

QuentinSoubeyranAqemia commented Oct 11, 2023

alexmojaki commented Oct 11, 2023

QuentinSoubeyranAqemia commented Oct 11, 2023

dmontagu Oct 12, 2023 • edited

Choose a reason for hiding this comment

dmontagu Oct 12, 2023

Choose a reason for hiding this comment

dmontagu Oct 12, 2023

Choose a reason for hiding this comment

QuentinSoubeyranAqemia Oct 12, 2023

Choose a reason for hiding this comment

alexmojaki Oct 12, 2023

Choose a reason for hiding this comment

QuentinSoubeyranAqemia commented Oct 12, 2023 • edited

QuentinSoubeyranAqemia commented Oct 12, 2023 • edited

QuentinSoubeyranAqemia commented Oct 12, 2023

Python 3.8 runs

Python 3.12 runs

Benchmark code

alexmojaki commented Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 12, 2023

alexmojaki commented Oct 12, 2023

QuentinSoubeyranAqemia commented Oct 14, 2023 • edited

QuentinSoubeyranAqemia commented Oct 14, 2023

alexmojaki commented Oct 15, 2023

QuentinSoubeyranAqemia commented Oct 16, 2023

dmontagu commented Nov 10, 2023 • edited

dmontagu commented Nov 10, 2023

alexmojaki commented Nov 10, 2023

sydney-runkle commented Nov 10, 2023

Only hash `model_fields`, not whole `dict` #7786

Only hash `model_fields`, not whole `dict` #7786

alexmojaki commented Oct 9, 2023 •

edited

dmontagu Oct 12, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 12, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 12, 2023 •

edited

QuentinSoubeyranAqemia commented Oct 14, 2023 •

edited

dmontagu commented Nov 10, 2023 •

edited