Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only hash model_fields, not whole __dict__ #7786

Merged
merged 6 commits into from Nov 13, 2023

Conversation

alexmojaki
Copy link
Contributor

@alexmojaki alexmojaki commented Oct 9, 2023

Change Summary

This deals with just the __hash__ part of #7444, because initially I was worried that it would be hard to fix without a significant hit to performance. With this timing code:

import timeit

setup = """
import pydantic

A = pydantic.create_model("A", __config__=dict(frozen=True), **{f'x{i}': (int, i) for i in range(100)})

a = A()
"""

stmt = """
hash(a)
"""

print(timeit.timeit(setup=setup, stmt=stmt, number=1000000))

the previous approach is slightly slower for 5 fields, but about 60-70% faster for 100 fields. So this PR may slow down some programs a little, but in the larger scheme of things it's hard to imagine it making a real difference since it can still hash a million times in about 2 seconds.

Closes #7785

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

@QuentinSoubeyranAqemia
Copy link
Contributor

Have you tried other implementation to compare performance, say a dict comprehension? I'd be curious to see how various implementation compare

@alexmojaki
Copy link
Contributor Author

This:

        def hash_func(self: Any) -> int:
            d = self.__dict__
            return hash(self.__class__) + hash(tuple(d[k] for k in cls.model_fields))

is about 2 to 3 times slower than the implementation in this PR. I'm actually surprised it's not worse. We always hear about how slow Python is and how deferring to C makes a huge difference, but I often get underwhelming results when doing so.

@QuentinSoubeyranAqemia
Copy link
Contributor

Interesting comparison !
While you're at it, what about:

def hash_func(self: Any) -> int:
            d = {self.__dict__[k] for k in cls.model_fields}
            return hash(self.__class__) + hash(tuple(d.values()))

Sometimes dict-comprehension are very fast

@alexmojaki
Copy link
Contributor Author

I'll leave you to try these yourself. Also, d is a set there.

@QuentinSoubeyranAqemia
Copy link
Contributor

I ran your benchmark and plotted the time taken as a function of the number of fields, your implementation is definitely the fastest, and is faster than the current one for very small objects.

I tweaked the implementation to call hash() only once; IIRC it is better to pass a tuple of things to hash, rather than manually summing the hases, because hash() will truncate the return value of __hash__, which can cause additional collisions.

results

This makes a good starting point to fix __eq__ with good performances, too

Code for the benchmark
import gc
import operator
import timeit
from typing import Type

import matplotlib.pyplot as plt
import pydantic
import tqdm.auto as tqdm


def set_hash_func_default(cls):
    pass


def set_hash_func_operator(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash((self.__class__, getter(self.__dict__)))

    cls.__hash__ = hash_func


def set_hash_func_tuple(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash((self.__class__, tuple(d[k] for k in cls.model_fields)))

    cls.__hash__ = hash_func


def set_hash_func_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        fields = {k: d[k] for k in cls.model_fields}
        return hash((self.__class__, tuple(fields.values())))

    cls.__hash__ = hash_func


setup = """
import pydantic

Model = pydantic.create_model(
    "Model",
    __config__=dict(frozen=True),
    **{f'x{i}': (int, i) for i in range(%(size)d)}
)
set_hash_func(Model)
instance = Model()
"""
statement = "hash(instance)"

sizes = list(range(100))
N_EXEC = 10_000
ys = {}
for set_hash_func in [
    set_hash_func_default,
    set_hash_func_operator,
    set_hash_func_tuple,
    set_hash_func_comprehension,
]:
    namespace = {"set_hash_func": set_hash_func}
    ys[set_hash_func.__name__] = [
        # Take the min of the repeats as recommended by timeit doc
        min(
            timeit.Timer(setup=setup % {"size": size}, stmt=statement, globals=namespace).repeat(
                repeat=5, number=N_EXEC
            )
        )
        / N_EXEC
        for size in tqdm.tqdm(sizes, desc=set_hash_func.__name__)
    ]
    gc.collect()

fig, ax = plt.subplots()

for name, times in ys.items():
    ax.plot(sizes, times, label=name)
ax.set_title("Performance benchmark for __hash__ implementations")
ax.set_xlabel("Number of fields in the pydantic model")
ax.set_ylabel("Average hash() time (s)")

plt.legend()
plt.tight_layout()
fig.savefig("results.png", dpi=300, facecolor="white", transparent=False)

@alexmojaki
Copy link
Contributor Author

I think the self.__class__ part should just be removed. That would fix the main bug in #7785 about generic classes. The alternative fix would be to instead hash the same type object used in __eq__ but this would reduce hash collisions very rarely (the field values are much more likely to differ than the type) and would slow things down in most cases.

@QuentinSoubeyranAqemia
Copy link
Contributor

That makes sense. I didn't see #7785 before. I would also be in favor of keeping __hash__ simple, so make making it a strict subset of __eq__ is simple.
As you say, adding the class name in the hash is unlikely to avoid collisions, it is rare IMHO that two classes have the exact same set of attributes and values.

Comment on lines 400 to 403
getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

def hash_func(self: Any) -> int:
return hash(self.__class__) + hash(tuple(self.__dict__.values()))
return hash(self.__class__) + hash(getter(self.__dict__))
Copy link
Contributor

@dmontagu dmontagu Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than just doing

        def hash_func(self: Any) -> int:
            return hash(self.__class__) + hash(tuple(self.__dict__.get(k) for k in cls.model_fields))

I'd prefer to avoid adding the operator import if possible (as we have generally been trying to reduce the number of imports pydantic triggers, which increase startup time), but if it's noticeably faster then maybe it makes sense. I think if you can't see a significant and repeatable difference, better to not use it.

Unless I've misunderstood and the getter thing is doing something different from my suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh I see there was a whole conversation about this and confirmation that it is significantly faster. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should confirm with @samuelcolvin though given the point about wanting to reduce import times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you propose using .get() instead of []. Do you know whether it is safe to use itemgetter(), which doesn't have a default and thus will raise an error if the key is not found?

Said differently, are there cases where some keys from cls.model_fields are not present on the instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said differently, are there cases where some keys from cls.model_fields are not present on the instance?

I tried to find ways that this might happen, which led me to #7784. I couldn't think of anything else but I don't actually know pydantic that deeply so I could easily be missing something. But my guess is that if it's possible then it's enough of a red flag that raising an error here would actually be a good thing.

@dmontagu dmontagu added the relnotes-fix Used for bugfixes. label Oct 12, 2023
@QuentinSoubeyranAqemia
Copy link
Contributor

QuentinSoubeyranAqemia commented Oct 12, 2023

Regarding pydantic load time, it seems pydantic already imports operator at load time (possibly transitively):
python -c 'import sys, pydantic; print("operator" in sys.modules.keys())'
prints True

It could be that sys itself depends on operator, since python -c 'import sy; print("operator" in sys.modules.keys())' also yields true. But the interpreter also loads some modules by default when it starts, so it's a bit hard to know

@QuentinSoubeyranAqemia
Copy link
Contributor

QuentinSoubeyranAqemia commented Oct 12, 2023

I performed a timing benchmark, I can't measure any difference in importing operator and pydantic vs. importing only pydantic.

Time pydantic alone:

for i in {1..100}; do { time -p python -c "import pydantic"; } 2>> import_time.txt; done
python -c 'import pathlib; print(min(float(line.split(" ")[1]) for line in pathlib.Path("import_time.txt").read_text().split("\n") if line.startswith("real"
)))'

Time pydantic + opeator:

for i in {1..100}; do { time -p python -c "import pydantic, operator"; } 2>> import_time_operator.txt; done
python -c 'import pathlib; print(min(float(line.split(" ")[1]) for line in pathlib.Path("import_time_operator.txt").read_text().split("\n") if line.startswith("real")))'

@QuentinSoubeyranAqemia
Copy link
Contributor

I did further benchmark in case we need to handle some keys in cls.model_fields not being reflected on the instance __dict__.
I updated the benchmark above to report performance normalized by the current implementation, and commented out the comprehensions which are very slow.

Interestingly, performance of the stdlib varies with python version: on python 3.8, only wrapping __dict__ in a collections.defaultdict if some keys are missing doesn't improve performance. But on python 3.12, it degrades performances quite a bit for large models, but improves it for small models.

I guess we could implement a switch based on the size of the __dict__ to unconditionnally wrap above a certain size.

Python 3.8 runs

No injection of additional value in __dict__

hash-benchmark_nocache_python-3-8-17-final-0

Injection of additional value in `__dict__` (e.g. by `functools.cached_property`)

hash_benchmark_cache_python-3-8-17-final-0

Python 3.12 runs

No injection of additional value in __dict__

hash-benchmark_nocache_python-3-12-0-final-0

Injection of additional value in `__dict__` (e.g. by `functools.cached_property`)

hash-benchmark_cache_python-3-12-0-final-0

Benchmark code

Benchmark code
import collections
import dataclasses
import gc
import operator
import sys
import timeit
from typing import Any, Callable, Dict, Set, Tuple, Type

import matplotlib.pyplot as plt
import numpy as np
import pydantic
import tqdm.auto as tqdm

PYTHON_VERSION = ".".join(map(str, sys.version_info))


def safe_itemgetter(*items) -> Callable[[Dict[str, Any]], Tuple[Any, ...]]:
    if not items:
        return lambda _: tuple()

    key_set = frozenset(items)
    getter = operator.itemgetter(*items)

    def safe_wrapper(d: Dict[str, Any]) -> Tuple[Any, ...]:
        return (
            getter(d) if d.keys() >= key_set else getter(collections.defaultdict(lambda: None, d))
        )

    return safe_wrapper


def noop(*args, **kwargs):
    pass


def set_hash_func_itemgetter(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(self.__dict__))

    cls.__hash__ = hash_func


def set_hash_func_itemgetter_defaultdict(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(collections.defaultdict(lambda: None, self.__dict__)))

    cls.__hash__ = hash_func


def set_hash_func_itemgetter_safe(cls: Type[pydantic.BaseModel]):
    getter = safe_itemgetter(*cls.model_fields.keys())

    def hash_func(self) -> int:
        return hash(getter(self.__dict__))

    cls.__hash__ = hash_func


def set_hash_func_tuple_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d[k] for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_get_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d.get(k) for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_dict_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        fields = {k: d[k] for k in cls.model_fields}
        return hash(tuple(fields.values()))

    cls.__hash__ = hash_func


SETUP = """
import pydantic

Model = pydantic.create_model(
    "Model",
    __config__=dict(frozen=True),
    **{f'x{i}': (int, i) for i in range(%(size)d)}
)
set_hash_func(Model)
instance = Model()
# Mimic @functools.cached_property editing __dict__
# instance.__dict__["not_a_field"] = None
"""
STATEMENT = "hash(instance)"


def time_hash_func(
    set_hash_func,
    *,
    sizes,
    n_execution: int,
    n_repeat: int = 5,
    progress_bar: bool = True,
) -> np.ndarray:
    namespace = {"set_hash_func": set_hash_func}
    times = [
        # Take the min of the repeats as recommended by timeit doc
        min(
            timeit.Timer(
                setup=SETUP % {"size": size},
                stmt=STATEMENT,
                globals=namespace,
            ).repeat(repeat=n_repeat, number=n_execution)
        )
        / n_execution
        for size in tqdm.tqdm(sizes, desc=set_hash_func.__name__, disable=not progress_bar)
    ]
    gc.collect()
    return np.array(times, dtype=float)


N_EXEC = 10_000
fig, ax = plt.subplots()

sizes = list(range(100))
baseline = time_hash_func(noop, sizes=sizes, n_execution=N_EXEC)

# Plot the baseline
ax.plot(sizes, baseline / baseline, label="baseline")

for label, set_hash_func in {
    "itemgetter": set_hash_func_itemgetter,
    "itemgetter+defaultdict": set_hash_func_itemgetter_defaultdict,
    "itemgetter+safety": set_hash_func_itemgetter_safe,
    # Those have bad performances and are deactivated to speed-up experimentation
    # "tuple-comprehension": set_hash_func_tuple_comprehension,
    # "get-comprehension": set_hash_func_get_comprehension,
    # "dict-comprehension": set_hash_func_dict_comprehension,
}.items():
    times = time_hash_func(set_hash_func, sizes=sizes, n_execution=N_EXEC)
    ax.plot(sizes, times / baseline, label=label)

ax.set_title(f"__hash__ timing benchmark (python {PYTHON_VERSION})")
ax.set_xlabel("Number of fields in the pydantic model")
ax.set_ylabel("Average time relative to baseline")

plt.legend()
plt.tight_layout()
fig.savefig(
    f"hash-benchmark_nocache_python-{PYTHON_VERSION}.png",
    dpi=300,
    facecolor="white",
    transparent=False,
)

@alexmojaki
Copy link
Contributor Author

Thanks @QuentinSoubeyranAqemia for all your analysis.

Regarding importing operator, in addition to the points above, there's already a from operator import attrgetter in _generate_schema.py.

@QuentinSoubeyranAqemia
Copy link
Contributor

I tried the following code, inspired by dataclasses, to try to get an instance that is missing a field. It doesn't seems to work, so I hope that pydantic enforces all fields to be present on the instance?

class Model(pydantic.BaseModel, frozen=True):
    attr: int = pydantic.Field(init_var=False)

m = Model()  # ValidationError

@QuentinSoubeyranAqemia
Copy link
Contributor

@alexmojaki I think we still need to remove __class__ from the hash in this PR? Might as well fix #7785 which you opened while we are modifying __hash__.

@alexmojaki
Copy link
Contributor Author

@alexmojaki I think we still need to remove __class__ from the hash in this PR? Might as well fix #7785 which you opened while we are modifying __hash__.

I agree, but that issue is still in pending state. @dmontagu WDYT, can I remove hash(self.__class__) + in this PR? See #7786 (comment) which is a bit buried in the conversation now.

@QuentinSoubeyranAqemia
Copy link
Contributor

QuentinSoubeyranAqemia commented Oct 14, 2023

Maybe we could add a test in tests/test_edge_cases.py ? Looks like a good place to test for that

@QuentinSoubeyranAqemia
Copy link
Contributor

@alexmojaki Just FYI, #7825 revealed an edge-case where you can have an instance that is missing some fields. This impacts your use of operator.itemgetter (I used it as well and that's why the problem showed up)

@alexmojaki
Copy link
Contributor Author

Thanks @QuentinSoubeyranAqemia. That means this PR should also fix another bug, since the current implementation assumes that the order of __dict__ is reliable:

from pydantic import BaseModel


class A(BaseModel, frozen=True):
    x: int
    y: int


a = A(x=1, y=2)
a2 = a.copy(exclude={"x"}).copy(update={"x": 1})
print(a.__dict__)   # {'x': 1, 'y': 2}
print(a2.__dict__)  # {'y': 2, 'x': 1}
assert a == a2
assert hash(a) != hash(a2)

I've pushed a fix with the fastest overall approach I could find, and I expect it can be reused for __eq__.

While I'm at it I removed the hash(self.__class__). However this causes the test test_hash_function_give_different_result_for_different_object to fail, which expects identical-looking models with different classes to have different hashes. Going back to when this was all added, I found this comment providing some motivation:

my problem here is that I imagine in the following code f and b would have the same hash and therefore be "equal" ... Which is definitely not what people would expected.

I think this is a mistake, and that it's OK for the hashes to match if only the classes differ.

@QuentinSoubeyranAqemia
Copy link
Contributor

I'll leverage the FallbackDict in the __eq__ implementation, thanks!

Getting models with missing fields is very weird, it violates the Liskov substitution principle. Interestingly, the documented alternative to the deprecated BaseModel.copy() method raise a ValidationError when excluding required fields.

For completeness sake's, I ran the benchmarks again with your safety mecanism: it has negligible performance impact on python 3.8, and since python 3.11 got zero-cost exception handling, impact is non-existent in python 3.11+.
It is faster than using collections.defaultdict unconditionally, which is nice, too.

hash-benchmark_python-3 8 17 final 0
hash-benchmark_python-3 12 0 final 0

Benchmark code (clock to reveal)
import collections
import gc
import itertools
import operator
import sys
import textwrap
import timeit
from os import name
from typing import Any, Dict, Iterable, List, Optional, Sized, Type

import matplotlib.pyplot as plt
import numpy as np
import pydantic
import tqdm.auto as tqdm
from matplotlib import axes, figure

PYTHON_VERSION = ".".join(map(str, sys.version_info))


def noop(*args, **kwargs):
    pass


def set_hash_func_itemgetter(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(self.__dict__))

    cls.__hash__ = hash_func


def set_hash_func_itemgetter_defaultdict(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: None

    def hash_func(self) -> int:
        return hash(getter(collections.defaultdict(lambda: None, self.__dict__)))

    cls.__hash__ = hash_func


class FallbackDict:
    def __init__(self, inner):
        self.inner = inner

    def __getitem__(self, key):
        return self.inner.get(key)


def set_hash_func_itemgetter_safe(cls: Type[pydantic.BaseModel]):
    getter = operator.itemgetter(*cls.model_fields.keys()) if cls.model_fields else lambda _: 0

    def hash_func(self: Any) -> int:
        try:
            return hash(getter(self.__dict__))
        except KeyError:
            # In rare cases (such as when using the deprecated copy method), the __dict__ may not contain
            # all model fields, which is how we can get here.
            # getter(self.__dict__) is much faster than any 'safe' method that accounts for missing keys,
            # and wrapping it in a `try` doesn't slow things down much in the common case.
            return hash(getter(FallbackDict(self.__dict__)))  # type: ignore

    cls.__hash__ = hash_func


def set_hash_func_tuple_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d[k] for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_get_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        return hash(tuple(d.get(k) for k in cls.model_fields))

    cls.__hash__ = hash_func


def set_hash_func_dict_comprehension(cls: Type[pydantic.BaseModel]):
    def hash_func(self) -> int:
        d = self.__dict__
        fields = {k: d[k] for k in cls.model_fields}
        return hash(tuple(fields.values()))

    cls.__hash__ = hash_func


class SizedIterable(Sized, Iterable):
    pass


def run_benchmark(
    title: str,
    setup: str,
    statement: str,
    n_execution: int = 10_000,
    n_repeat: int = 5,
    globals: Optional[Dict[str, Any]] = None,
    progress_bar: bool = True,
    params: Optional[Dict[str, SizedIterable]] = None,
) -> np.ndarray:
    namespace = globals or {}
    # fast-path
    if not params:
        length = 1
        packed_params = [()]
    else:
        length = len(next(iter(params.values())))
        # This iterator yields a tuple of (key, value) pairs
        # First, make a list of N iterator over (key, value), where the provided values are iterated
        param_pairs = [zip(itertools.repeat(name), value) for name, value in params.items()]
        # Then pack our individual parameter iterator into one
        packed_params = zip(*param_pairs)

    times = [
        # Take the min of the repeats as recommended by timeit doc
        min(
            timeit.Timer(
                setup=setup % dict(local_params),
                stmt=statement,
                globals=namespace,
            ).repeat(repeat=n_repeat, number=n_execution)
        )
        / n_execution
        for local_params in tqdm.tqdm(
            packed_params, desc=title, total=length, disable=not progress_bar
        )
    ]
    gc.collect()

    return np.asarray(times, dtype=float)


def run_hash_benchmark(
    title: str,
    set_hash_func,
    sizes: SizedIterable,
    mimic_cached_property: bool,
    n_execution: int = 10_000,
    n_repeat: int = 5,
) -> np.ndarray:
    setup = textwrap.dedent(
        f"""
        import pydantic

        Model = pydantic.create_model(
            "Model",
            __config__=dict(frozen=True),
            **{{f'x{{i}}': (int, i) for i in range(%(size)d)}}
        )
        set_hash_func(Model)
        instance = Model()
        # Mimic @functools.cached_property editing __dict__
        {"instance.__dict__['not_a_field'] = None" if mimic_cached_property else ''}
        """
    )
    statement = "hash(instance)"
    namespace = {"set_hash_func": set_hash_func}
    return run_benchmark(
        title=title,
        setup=setup,
        statement=statement,
        n_execution=n_execution,
        n_repeat=n_repeat,
        globals=namespace,
        params={"size": sizes},
    )


def plot_benchmark(
    title: str,
    sizes: List[int],
    mimic_cached_property: bool,
    ax: Optional[axes.Axes] = None,
):
    ax = ax or plt.gca()

    baseline = run_hash_benchmark(
        title=f'{title}, baseline',
        set_hash_func=noop,
        sizes=sizes,
        mimic_cached_property=mimic_cached_property,
    )
    ax.plot(sizes, baseline / baseline, label='baseline')
    for label, set_hash_func in {
        "itemgetter": set_hash_func_itemgetter,
        "itemgetter+defaultdict": set_hash_func_itemgetter_defaultdict,
        "itemgetter+safety": set_hash_func_itemgetter_safe,
        # Those have bad performances and are deactivated to speed-up experimentation
        # "tuple-comprehension": set_hash_func_tuple_comprehension,
        # "get-comprehension": set_hash_func_get_comprehension,
        # "dict-comprehension": set_hash_func_dict_comprehension,
    }.items():
        times = run_hash_benchmark(
            title=f'{title}, {label}',
            set_hash_func=set_hash_func,
            sizes=sizes,
            mimic_cached_property=mimic_cached_property,
        )
        ax.plot(sizes, times / baseline, label=label)

    ax.set_title(title)
    ax.set_xlabel('Number of pydantic fields')
    ax.set_ylabel('Average time relative to baseline')
    return ax


def plot_all_benchmarks(sizes: List[int]) -> figure.Figure:
    n_rows, n_cols = 1, 2
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 4))

    for col, mimic_cached_property in enumerate([False, True]):
        plot_benchmark(
            title=f'{mimic_cached_property=}',
            sizes=sizes,
            mimic_cached_property=mimic_cached_property,
            ax=axes[col],
        )

    for ax in axes.ravel():
        ax.legend()
    fig.suptitle(f"__hash__ timing benchmark (python {PYTHON_VERSION})")
    return fig


if __name__ == "__main__":
    fig = plot_all_benchmarks(list(range(100)))
    plt.tight_layout()
    target = f"hash-benchmark_python-{PYTHON_VERSION}.png"
    fig.savefig(
        target,
        dpi=300,
        facecolor="white",
        transparent=False,
    )
    print(f"Wrote {target}", file=sys.stderr)

@dmontagu
Copy link
Contributor

dmontagu commented Nov 10, 2023

@alexmojaki regarding the removal of the class in the hash — I think it might make more sense for it to grab the generic origin and include that in the hash (same as is done by the equality method). That would at least keep different classes with the same data dicts having different hashes, presumably reducing collisions. But obviously that's already going to be rare so maybe it is better to not worry about it and let the equality check differentiate. (And I guess it will make the common case of having different data a bit faster anyway due to eliminating the hashing of the type?) I'm happy to defer to you.

@dmontagu
Copy link
Contributor

(@alexmojaki I'll wait to actually merge until you give a reply to the last comment about how you want to proceed, but to be clear I'm okay merging this as is if you think that's best.)

@alexmojaki
Copy link
Contributor Author

I think it'd be so weird to have a dict/set containing a large number of models of different types (already an oddity in normal code) with the same data that it's worth optimising the common case.

@sydney-runkle
Copy link
Member

LGTM overall, but I'd prefer to wait and merge once we release 2.5.0 - I don't think it makes sense to squeeze in changes between 2.5.0b1 and now. Happy to merge this mid-next week when we get the next release out.

Thanks for your work on this @alexmojaki!

@sydney-runkle sydney-runkle merged commit f23578e into pydantic:main Nov 13, 2023
59 checks passed
@samuelcolvin samuelcolvin changed the title Only hash model_fields, not whole __dict__ Only hash model_fields, not whole __dict__ Nov 15, 2023
QuentinSoubeyranAqemia added a commit to QuentinSoubeyranAqemia/pydantic that referenced this pull request Nov 15, 2023
jsuchenia pushed a commit to jsuchenia/adventofcode that referenced this pull request Feb 13, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [pydantic](https://github.com/pydantic/pydantic) ([changelog](https://docs.pydantic.dev/latest/changelog/)) | dependencies | minor | `2.5.3` -> `2.6.1` |

---

### Release Notes

<details>
<summary>pydantic/pydantic (pydantic)</summary>

### [`v2.6.1`](https://github.com/pydantic/pydantic/blob/HEAD/HISTORY.md#v261-2024-02-05)

[Compare Source](pydantic/pydantic@v2.6.0...v2.6.1)

[GitHub release](https://github.com/pydantic/pydantic/releases/tag/v2.6.1)

##### What's Changed

##### Packaging

-   Upgrade to `pydantic-core` 2.16.2 by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8717](pydantic/pydantic#8717)

##### Fixes

-   Fix bug with `mypy` plugin and `no_strict_optional = True` by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8666](pydantic/pydantic#8666)
-   Fix `ByteSize` error `type` change by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8681](pydantic/pydantic#8681)
-   Fix inheriting `Field` annotations in dataclasses by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8679](pydantic/pydantic#8679)
-   Fix regression in core schema generation for indirect definition references by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8702](pydantic/pydantic#8702)
-   Fix unsupported types bug with `PlainValidator` by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8710](pydantic/pydantic#8710)
-   Reverting problematic fix from 2.6 release, fixing schema building bug by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8718](pydantic/pydantic#8718)
-   Fix warning for tuple of wrong size in `Union` by [@&#8203;davidhewitt](https://github.com/davidhewitt) in [pydantic/pydantic-core#1174](pydantic/pydantic-core#1174)
-   Fix `computed_field` JSON serializer `exclude_none` behavior by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [pydantic/pydantic-core#1187](pydantic/pydantic-core#1187)

### [`v2.6.0`](https://github.com/pydantic/pydantic/blob/HEAD/HISTORY.md#v260-2024-01-23)

[Compare Source](pydantic/pydantic@v2.5.3...v2.6.0)

[GitHub release](https://github.com/pydantic/pydantic/releases/tag/v2.6.0)

The code released in v2.6.0 is practically identical to that of v2.6.0b1.

##### What's Changed

##### Packaging

-   Check for `email-validator` version >= 2.0 by [@&#8203;commonism](https://github.com/commonism) in [#&#8203;6033](pydantic/pydantic#6033)
-   Upgrade \`ruff\`\` target version to Python 3.8 by [@&#8203;Elkiwa](https://github.com/Elkiwa) in [#&#8203;8341](pydantic/pydantic#8341)
-   Update to `pydantic-extra-types==2.4.1` by [@&#8203;yezz123](https://github.com/yezz123) in [#&#8203;8478](pydantic/pydantic#8478)
-   Update to `pyright==1.1.345` by [@&#8203;Viicos](https://github.com/Viicos) in [#&#8203;8453](pydantic/pydantic#8453)
-   Update pydantic-core from 2.14.6 to 2.16.1, significant changes from these updates are described below, full changelog [here](pydantic/pydantic-core@v2.14.6...v2.16.1)

##### New Features

-   Add `NatsDsn` by [@&#8203;ekeew](https://github.com/ekeew) in [#&#8203;6874](pydantic/pydantic#6874)
-   Add `ConfigDict.ser_json_inf_nan` by [@&#8203;davidhewitt](https://github.com/davidhewitt) in [#&#8203;8159](pydantic/pydantic#8159)
-   Add `types.OnErrorOmit` by [@&#8203;adriangb](https://github.com/adriangb) in [#&#8203;8222](pydantic/pydantic#8222)
-   Support `AliasGenerator` usage by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8282](pydantic/pydantic#8282)
-   Add Pydantic People Page to docs by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8345](pydantic/pydantic#8345)
-   Support `yyyy-MM-DD` datetime parsing by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8404](pydantic/pydantic#8404)
-   Added bits conversions to the `ByteSize` class [#&#8203;8415](pydantic/pydantic#8415) by [@&#8203;luca-matei](https://github.com/luca-matei) in [#&#8203;8507](pydantic/pydantic#8507)
-   Enable json schema creation with type `ByteSize` by [@&#8203;geospackle](https://github.com/geospackle) in [#&#8203;8537](pydantic/pydantic#8537)
-   Add `eval_type_backport` to handle union operator and builtin generic subscripting in older Pythons by [@&#8203;alexmojaki](https://github.com/alexmojaki) in [#&#8203;8209](pydantic/pydantic#8209)
-   Add support for `dataclass` fields `init` by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8552](pydantic/pydantic#8552)
-   Implement pickling for `ValidationError` by [@&#8203;davidhewitt](https://github.com/davidhewitt) in [pydantic/pydantic-core#1119](pydantic/pydantic-core#1119)
-   Add unified tuple validator that can handle "variadic" tuples via PEP-646 by [@&#8203;dmontagu](https://github.com/dmontagu) in [pydantic/pydantic-core#865](pydantic/pydantic-core#865)

##### Changes

-   Drop Python3.7 support by [@&#8203;hramezani](https://github.com/hramezani) in [#&#8203;7188](pydantic/pydantic#7188)
-   Drop Python 3.7, and PyPy 3.7 and 3.8 by [@&#8203;davidhewitt](https://github.com/davidhewitt) in [pydantic/pydantic-core#1129](pydantic/pydantic-core#1129)
-   Use positional-only `self` in `BaseModel` constructor, so no field name can ever conflict with it by [@&#8203;ariebovenberg](https://github.com/ariebovenberg) in [#&#8203;8072](pydantic/pydantic#8072)
-   Make `@validate_call` return a function instead of a custom descriptor - fixes binding issue with inheritance and adds `self/cls` argument to validation errors by [@&#8203;alexmojaki](https://github.com/alexmojaki) in [#&#8203;8268](pydantic/pydantic#8268)
-   Exclude `BaseModel` docstring from JSON schema description by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8352](pydantic/pydantic#8352)
-   Introducing `classproperty` decorator for `model_computed_fields` by [@&#8203;Jocelyn-Gas](https://github.com/Jocelyn-Gas) in [#&#8203;8437](pydantic/pydantic#8437)
-   Explicitly raise an error if field names clashes with types by [@&#8203;Viicos](https://github.com/Viicos) in [#&#8203;8243](pydantic/pydantic#8243)
-   Use stricter serializer for unions of simple types by [@&#8203;alexdrydew](https://github.com/alexdrydew) [pydantic/pydantic-core#1132](pydantic/pydantic-core#1132)

##### Performance

-   Add Codspeed profiling Actions workflow  by [@&#8203;lambertsbennett](https://github.com/lambertsbennett) in [#&#8203;8054](pydantic/pydantic#8054)
-   Improve `int` extraction by [@&#8203;samuelcolvin](https://github.com/samuelcolvin) in [pydantic/pydantic-core#1155](pydantic/pydantic-core#1155)
-   Improve performance of recursion guard by [@&#8203;samuelcolvin](https://github.com/samuelcolvin) in [pydantic/pydantic-core#1156](pydantic/pydantic-core#1156)
-   `dataclass` serialization speedups by [@&#8203;samuelcolvin](https://github.com/samuelcolvin) in [pydantic/pydantic-core#1162](pydantic/pydantic-core#1162)
-   Avoid `HashMap` creation when looking up small JSON objects in `LazyIndexMaps` by [@&#8203;samuelcolvin](https://github.com/samuelcolvin) in [pydantic/jiter#55](pydantic/jiter#55)
-   use hashbrown to speedup python string caching by [@&#8203;davidhewitt](https://github.com/davidhewitt) in [pydantic/jiter#51](pydantic/jiter#51)
-   Replace `Peak` with more efficient `Peek` by [@&#8203;davidhewitt](https://github.com/davidhewitt) in [pydantic/jiter#48](pydantic/jiter#48)

##### Fixes

-   Move `getattr` warning in deprecated `BaseConfig` by [@&#8203;tlambert03](https://github.com/tlambert03) in [#&#8203;7183](pydantic/pydantic#7183)
-   Only hash `model_fields`, not whole `__dict__` by [@&#8203;alexmojaki](https://github.com/alexmojaki) in [#&#8203;7786](pydantic/pydantic#7786)
-   Fix mishandling of unions while freezing types in the `mypy` plugin by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;7411](pydantic/pydantic#7411)
-   Fix `mypy` error on untyped `ClassVar` by [@&#8203;vincent-hachin-wmx](https://github.com/vincent-hachin-wmx) in [#&#8203;8138](pydantic/pydantic#8138)
-   Only compare pydantic fields in `BaseModel.__eq__` instead of whole `__dict__` by [@&#8203;QuentinSoubeyranAqemia](https://github.com/QuentinSoubeyranAqemia) in [#&#8203;7825](pydantic/pydantic#7825)
-   Update `strict` docstring in `model_validate` method. by [@&#8203;LukeTonin](https://github.com/LukeTonin) in [#&#8203;8223](pydantic/pydantic#8223)
-   Fix overload position of `computed_field` by [@&#8203;Viicos](https://github.com/Viicos) in [#&#8203;8227](pydantic/pydantic#8227)
-   Fix custom type type casting used in multiple attributes by [@&#8203;ianhfc](https://github.com/ianhfc) in [#&#8203;8066](pydantic/pydantic#8066)
-   Fix issue not allowing `validate_call` decorator to be dynamically assigned to a class method by [@&#8203;jusexton](https://github.com/jusexton) in [#&#8203;8249](pydantic/pydantic#8249)
-   Fix issue `unittest.mock` deprecation warnings  by [@&#8203;ibleedicare](https://github.com/ibleedicare) in [#&#8203;8262](pydantic/pydantic#8262)
-   Added tests for the case `JsonValue` contains subclassed primitive values by [@&#8203;jusexton](https://github.com/jusexton) in [#&#8203;8286](pydantic/pydantic#8286)
-   Fix `mypy` error on free before validator (classmethod) by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8285](pydantic/pydantic#8285)
-   Fix `to_snake` conversion by [@&#8203;jevins09](https://github.com/jevins09) in [#&#8203;8316](pydantic/pydantic#8316)
-   Fix type annotation of `ModelMetaclass.__prepare__` by [@&#8203;slanzmich](https://github.com/slanzmich) in [#&#8203;8305](pydantic/pydantic#8305)
-   Disallow `config` specification when initializing a `TypeAdapter` when the annotated type has config already by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8365](pydantic/pydantic#8365)
-   Fix a naming issue with JSON schema for generics parametrized by recursive type aliases by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8389](pydantic/pydantic#8389)
-   Fix type annotation in pydantic people script by [@&#8203;shenxiangzhuang](https://github.com/shenxiangzhuang) in [#&#8203;8402](pydantic/pydantic#8402)
-   Add support for field `alias` in `dataclass` signature by [@&#8203;NeevCohen](https://github.com/NeevCohen) in [#&#8203;8387](pydantic/pydantic#8387)
-   Fix bug with schema generation with `Field(...)` in a forward ref by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8494](pydantic/pydantic#8494)
-   Fix ordering of keys in `__dict__` with `model_construct` call by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8500](pydantic/pydantic#8500)
-   Fix module `path_type` creation when globals does not contain `__name__` by [@&#8203;hramezani](https://github.com/hramezani) in [#&#8203;8470](pydantic/pydantic#8470)
-   Fix for namespace issue with dataclasses with `from __future__ import annotations` by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8513](pydantic/pydantic#8513)
-   Fix: make function validator types positional-only by [@&#8203;pmmmwh](https://github.com/pmmmwh) in [#&#8203;8479](pydantic/pydantic#8479)
-   Fix usage of `@deprecated` by [@&#8203;Viicos](https://github.com/Viicos) in [#&#8203;8294](pydantic/pydantic#8294)
-   Add more support for private attributes in `model_construct` call by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8525](pydantic/pydantic#8525)
-   Use a stack for the types namespace by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8378](pydantic/pydantic#8378)
-   Fix schema-building bug with `TypeAliasType` for types with refs by [@&#8203;dmontagu](https://github.com/dmontagu) in [#&#8203;8526](pydantic/pydantic#8526)
-   Support `pydantic.Field(repr=False)` in dataclasses by [@&#8203;tigeryy2](https://github.com/tigeryy2) in [#&#8203;8511](pydantic/pydantic#8511)
-   Override `dataclass_transform` behavior for `RootModel` by [@&#8203;Viicos](https://github.com/Viicos) in [#&#8203;8163](pydantic/pydantic#8163)
-   Refactor signature generation for simplicity by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [#&#8203;8572](pydantic/pydantic#8572)
-   Fix ordering bug of PlainValidator annotation by [@&#8203;Anvil](https://github.com/Anvil) in [#&#8203;8567](pydantic/pydantic#8567)
-   Fix `exclude_none` for json serialization of `computed_field`s by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [pydantic/pydantic-core#1098](pydantic/pydantic-core#1098)
-   Support yyyy-MM-DD string for datetimes by [@&#8203;sydney-runkle](https://github.com/sydney-runkle) in [pydantic/pydantic-core#1124](pydantic/pydantic-core#1124)
-   Tweak ordering of definitions in generated schemas by [@&#8203;StrawHatDrag0n](https://github.com/StrawHatDrag0n) in [#&#8203;8583](pydantic/pydantic#8583)

##### New Contributors

##### `pydantic`

-   [@&#8203;ekeew](https://github.com/ekeew) made their first contribution in [#&#8203;6874](pydantic/pydantic#6874)
-   [@&#8203;lambertsbennett](https://github.com/lambertsbennett) made their first contribution in [#&#8203;8054](pydantic/pydantic#8054)
-   [@&#8203;vincent-hachin-wmx](https://github.com/vincent-hachin-wmx) made their first contribution in [#&#8203;8138](pydantic/pydantic#8138)
-   [@&#8203;QuentinSoubeyranAqemia](https://github.com/QuentinSoubeyranAqemia) made their first contribution in [#&#8203;7825](pydantic/pydantic#7825)
-   [@&#8203;ariebovenberg](https://github.com/ariebovenberg) made their first contribution in [#&#8203;8072](pydantic/pydantic#8072)
-   [@&#8203;LukeTonin](https://github.com/LukeTonin) made their first contribution in [#&#8203;8223](pydantic/pydantic#8223)
-   [@&#8203;denisart](https://github.com/denisart) made their first contribution in [#&#8203;8231](pydantic/pydantic#8231)
-   [@&#8203;ianhfc](https://github.com/ianhfc) made their first contribution in [#&#8203;8066](pydantic/pydantic#8066)
-   [@&#8203;eonu](https://github.com/eonu) made their first contribution in [#&#8203;8255](pydantic/pydantic#8255)
-   [@&#8203;amandahla](https://github.com/amandahla) made their first contribution in [#&#8203;8263](pydantic/pydantic#8263)
-   [@&#8203;ibleedicare](https://github.com/ibleedicare) made their first contribution in [#&#8203;8262](pydantic/pydantic#8262)
-   [@&#8203;jevins09](https://github.com/jevins09) made their first contribution in [#&#8203;8316](pydantic/pydantic#8316)
-   [@&#8203;cuu508](https://github.com/cuu508) made their first contribution in [#&#8203;8322](pydantic/pydantic#8322)
-   [@&#8203;slanzmich](https://github.com/slanzmich) made their first contribution in [#&#8203;8305](pydantic/pydantic#8305)
-   [@&#8203;jensenbox](https://github.com/jensenbox) made their first contribution in [#&#8203;8331](pydantic/pydantic#8331)
-   [@&#8203;szepeviktor](https://github.com/szepeviktor) made their first contribution in [#&#8203;8356](pydantic/pydantic#8356)
-   [@&#8203;Elkiwa](https://github.com/Elkiwa) made their first contribution in [#&#8203;8341](pydantic/pydantic#8341)
-   [@&#8203;parhamfh](https://github.com/parhamfh) made their first contribution in [#&#8203;8395](pydantic/pydantic#8395)
-   [@&#8203;shenxiangzhuang](https://github.com/shenxiangzhuang) made their first contribution in [#&#8203;8402](pydantic/pydantic#8402)
-   [@&#8203;NeevCohen](https://github.com/NeevCohen) made their first contribution in [#&#8203;8387](pydantic/pydantic#8387)
-   [@&#8203;zby](https://github.com/zby) made their first contribution in [#&#8203;8497](pydantic/pydantic#8497)
-   [@&#8203;patelnets](https://github.com/patelnets) made their first contribution in [#&#8203;8491](pydantic/pydantic#8491)
-   [@&#8203;edwardwli](https://github.com/edwardwli) made their first contribution in [#&#8203;8503](pydantic/pydantic#8503)
-   [@&#8203;luca-matei](https://github.com/luca-matei) made their first contribution in [#&#8203;8507](pydantic/pydantic#8507)
-   [@&#8203;Jocelyn-Gas](https://github.com/Jocelyn-Gas) made their first contribution in [#&#8203;8437](pydantic/pydantic#8437)
-   [@&#8203;bL34cHig0](https://github.com/bL34cHig0) made their first contribution in [#&#8203;8501](pydantic/pydantic#8501)
-   [@&#8203;tigeryy2](https://github.com/tigeryy2) made their first contribution in [#&#8203;8511](pydantic/pydantic#8511)
-   [@&#8203;geospackle](https://github.com/geospackle) made their first contribution in [#&#8203;8537](pydantic/pydantic#8537)
-   [@&#8203;Anvil](https://github.com/Anvil) made their first contribution in [#&#8203;8567](pydantic/pydantic#8567)
-   [@&#8203;hungtsetse](https://github.com/hungtsetse) made their first contribution in [#&#8203;8546](pydantic/pydantic#8546)
-   [@&#8203;StrawHatDrag0n](https://github.com/StrawHatDrag0n) made their first contribution in [#&#8203;8583](pydantic/pydantic#8583)

##### `pydantic-core`

-   [@&#8203;mariuswinger](https://github.com/mariuswinger) made their first contribution in [pydantic/pydantic-core#1087](pydantic/pydantic-core#1087)
-   [@&#8203;adamchainz](https://github.com/adamchainz) made their first contribution in [pydantic/pydantic-core#1090](pydantic/pydantic-core#1090)
-   [@&#8203;akx](https://github.com/akx) made their first contribution in [pydantic/pydantic-core#1123](pydantic/pydantic-core#1123)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xMzAuMCIsInVwZGF0ZWRJblZlciI6IjM3LjEzMC4wIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=-->

Reviewed-on: https://git.apud.pl/jacek/adventofcode/pulls/57
Co-authored-by: Renovate <renovate@apud.pl>
Co-committed-by: Renovate <renovate@apud.pl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes-fix Used for bugfixes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BaseModel.__hash__ doesn't match __eq__
4 participants