New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TypeAdapter to respect defer_build #8939
Fix TypeAdapter to respect defer_build #8939
Conversation
a48b614
to
845780d
Compare
CodSpeed Performance ReportMerging #8939 will degrade performances by 12.43%Comparing Summary
Benchmarks breakdown
|
2a1a443
to
ce7909e
Compare
please review |
8a10e17
to
38dc679
Compare
With this our FastAPI initialization drops from ~40s to ~10s. Where core schema generation takes ~3-4s. This requires using |
Hmm, I'll take an in-depth look shortly, but my initial thought here was that we weren't planning on adding support for |
Thanks! That is interesting. I'm wondering why in the referenced issue it was suggested to use |
If there are some reservations about TypeAdapter supporting lazy core schema building like the BaseModels, then you could add a similar environment variable like suggested here #6768 (comment) (which unfortunately didn't work). So eg You can see here how our service startup time is going up as time goes by (additional data models and API models added). Now its already over a minute where the core schemas building (in TypeAdapters) takes about a minute: With Pydantic V1 the startup took about 10 seconds so the issue is getting worse with V2. |
@sydney-runkle any chance on allowing TypeAdapter to respect deferred building? Could we allow it via an environment flag as I suggested above? The slowness caused by core schemas is getting out of hand. |
Thanks for the ping, and sorry for the delay! I'll bring this up in our standup meeting on Monday and get back to you then! |
We'll discuss next week, but if it's helping you a lot and is opt in, I'm 👍. |
Thank you! This would indeed help. Atleast until there are optimizations/caching added to the CoreSchema generation. But probably implementing those are not happening in very near future |
I now made it an opt-in feature via the usual config object (instead of magical global flags as I originally suggested). See 7deb045 Also documented the current and the opt-in behaviour there. |
f8d413f
to
3316871
Compare
@MarkusSintonen, we chatted this morning - let's move forward with this 🚀. I know you've implemented support for this as an opt in feature. I think we'll want to add some documentation explaining that this is experimental, and subject to change. Ultimately, a better solution will be to have TA build times improve significantly so that your changes aren't super necessary. I'll review thoroughly this afternoon :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to take a closer look at the logic changes in type_adapter.py
, but here's some general feedback on the new API :).
Thanks for your great work on this thus far!
I still refactored it a bit into a smaller property functions with |
…test. Fix with main change
56f309d
to
d6420f4
Compare
02ba59f
to
b1a0518
Compare
…ing. Add more missing tests.
c90363d
to
44d8b42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work on this.
Left some comments ranging from more broad to nitpicky change requests!
I think it makes sense for us to go ahead with the 2.7 beta release given that we still have more work to do on this PR (and there's lots of fixes in that release that we want to go ahead and get out), but I'm happy to continue to work closely with you to get this across the line!
pydantic/_internal/_mock_val_ser.py
Outdated
def __contains__(self, key: Any) -> bool: | ||
return self._get_built().__contains__(key) | ||
|
||
def __getitem__(self, key: str) -> Any: | ||
return self._get_built().__getitem__(key) | ||
|
||
def __len__(self) -> int: | ||
return self._get_built().__len__() | ||
|
||
def __iter__(self) -> Iterator[str]: | ||
return self._get_built().__iter__() | ||
|
||
def _get_built(self) -> CoreSchema: | ||
if self._built_memo is not None: | ||
return self._built_memo | ||
|
||
if self._attempt_rebuild: | ||
schema = self._attempt_rebuild() | ||
if schema is not None: | ||
self._built_memo = schema | ||
return schema | ||
raise PydanticUserError(self._error_message, code=self._code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by necessary? For abstract Mapping
the __getitem__ / __len__ / __iter__
are required
Ill remove the __contains__
as it doesnt need overriding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, removed the unneeded __contains__
override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I mean they aren't implemented for MockValSer
, right? So why have them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoreSchema is a dict but eg SchemaValidator is an ordinary class
pydantic/type_adapter.py
Outdated
def _frame_depth(depth: int) -> Callable[[Callable[..., R]], Callable[..., R]]: | ||
def wrapper(func: Callable[..., R]) -> Callable[..., R]: | ||
@wraps(func) | ||
def wrapped(self: TypeAdapter, *args: Any, **kwargs: Any) -> R: | ||
# depth + 1 for the wrapper function | ||
with self._with_frame_depth(depth + 1): | ||
return func(self, *args, **kwargs) | ||
|
||
return wrapped | ||
|
||
return wrapper | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe it'd help to have a bit more documentation for this function and the frame depth function attached to the TypeAdapter
class - it'd help me in my review as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added some comment to this.
FYI Im wondering is there a better way than the _parent_depth
handling. What about requiring user to define a callback function that would resolve the forward refs from locals/globals? So replacing _parent_depth
with something like resolve_namespace: Callable[[], dict[str, Any]]
where the str
is the name of the type. Then it wouldnt be as fragile as the parent depth handling which could be still there but callable being preferred. Probably such resolver would need to come via Config instead than from the constructor arg directly.
This is ofcourse out of scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely wish we had a better way for users to specify/pass the namespace, but the reason for the parent depth thing was so that it would generally behave correctly without any extra work. At this point, I think it's probably not possible to get rid of _parent_depth
unless we find a way to keep all existing code that currently relies on that from breaking. In v3 I think we could make a change to this if we felt it simplified things significantly or otherwise had (even indirect) import-time performance benefits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah no way to get rid of parent depth handling but some better (optional) way could be offered via configs to resolve the names. As working of the parent depth handling highly depends on the context on how model happens to be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite impressive. I have the usual fear that there may be some surprises lurking but it generally seems quite solid. The "changes" to the behavior (such as not erroring when you access the core schema if it needs a rebuild, but only when you try to use it in some way) seem fine to me, so deep in the internals that I'm not very concerned. (Until someone reports a bug this causes... 🙂)
cd18f3f
to
855a3f9
Compare
Thank you for the throughout review! Did the new round of changes |
Yes that was also my conclusion. It's so deeply internal that no one should rely on it (until they did). I also feel like it's now more consistent with rest of MockValSer behavior with lazy building. |
if not self._defer_build(): | ||
# Immediately initialize the core schema, validator and serializer | ||
with self._with_frame_depth(1): # +1 frame depth for this __init__ | ||
# Model itself may be using deferred building. For backward compatibility we don't rebuild model mocks | ||
# here as part of __init__ even though TypeAdapter itself is not using deferred building. | ||
self._init_core_attrs(rebuild_mocks=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit fuzzy to me still. Core schema and friends are not initialized immediately (rebuild_mocks=False
) when _type
is subclass of BaseModel
even though _defer_build_mode
does not include type_adapter
. This is how it also behaved previously and we probably can not really change it (can not do rebuild_mocks=True
here). Previously it built the core schema right away but it was never used for validation/serialization.
Still the case where it is very inconsistent is when BaseModel
is inside Annotated[BaseModel, Field]
. I feel like this is a bit strange behavior even in current state of PR. So the Annotated
wrapping case suddenly requires including the type_adapter
to _defer_build_mode
even though it includes model
. The behavior is quite a beast to document properly (as its slight weird). It also means the only reasoning for having the new _defer_build_mode
is for the Annotated-BaseModel type which feels slightly off. Because any other case has previously required passing the Config explicitly to the TypeAdapter.__init__
with possible defer built enabled.
I would say most clear way would be to just drop the _defer_build_mode
and have just the existing boolean, which feels much simpler. As the only real reason for having it now is for the Annotated
case. But then again there were some concerns with this. Although I'm not 100% convinced they are actually concerns for Annotated BaseModel usage cases unless I'm missing something. Because also previously the model was defer built via TypeAdapter but not Annotated one for some reason. Ie what is special about Annotated-BaseModel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow, but I don't think it's a huge deal if the handling of Annotated[BaseModel, Field]
is less performant. If it's invalid then I guess that's more of a concern.
I don't really understand the alternative you are proposing (i.e., dropping the _defer_build_mode
), what is the consequence? If it's not hard to make that change on a separate branch that I could compare against this that would probably help a lot. Also, related, I think I don't really mind too much if it's not documented clearly why things work the way they do as long as changes result in failing tests, if necessary to write things in a weird way then ideally there would be a test included specifically for the sake of documenting why things seem to be written in that weird way, and the test could reference the code or vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I don't think it's a huge deal if the handling of
Annotated[BaseModel, Field]
is less performant
The Annotated
inconsistency here is actually the root of the issue 😄 As this kind of pattern is heavily used by eg FastAPI with TypeAdapters. All of the Pydantic models there go through this TypeAdapter(Annotated[BaseModel, Field])
. So the defer_builld=True
param hasnt worked at all in that context. (Causing the mentioned performance issues eg with auto scalers)
If it's invalid then I guess that's more of a concern
More about backward compatibility (but not in FastAPI context) if someone happens to rely on the fact that case like TypeAdapter(Annotated[BaseModel, Field])
gets immediately built eventhough BaseModel used defer_build=True
. It feels again kinda deep internal detail how it happened to work previously.
I dont think there are any other concerns than the Annotated
case. Because every other case follows more explicit pattern like TypeAdapter(Dict[str, Any], config={"defer_build": True))
(which didnt either work previously)
(i.e., dropping the _defer_build_mode), what is the consequence?
Consequence is that the TypeAdapter
starts to "respect" the defer_build=True
without further action from the user. Ie the lazy building behaviour matches that of BaseModel
in Annotated
/"plain-type" case. Currently _defer_build_mode
feels like only useful for enabling the Annotated
case to be deferred built without the risk of someone happening to rely on the internal detail on how previously worked.
This is how it would look without the additional config parameter: https://github.com/MarkusSintonen/pydantic/compare/type-adapter-defer-build...MarkusSintonen:pydantic:type-adapter-defer-build-actually?expand=1
self._error_message = error_message | ||
self._code: PydanticErrorCodes = code | ||
self._attempt_rebuild = attempt_rebuild | ||
self._built_memo: CoreSchema | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used _built_memo
memoizer here to avoid cases where user could capture the internal mocker class and then use it. That would again and again go through the model rebuilder which has deeply the core schema etc memoized after its built. Should the existing MockValSer
for consistency also have its memoizer so it wont accidentally go again and again through the wrapped class deep memoizer?
Going to do a 2.7.1 patch release soon (Friday or Monday), then would love to get this merged. Will ping you if we have any additional questions before merging :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really appreciate your work on this - the thorough testing and many rounds of iteration are certainly quite appreciated!!
Going ahead and merging this now - I've acknowledged the one schema building behchmark that experienced a bit of a regression. I'd love to see that improve again before we release 2.8, but I'm not particularly worried about the magnitude of the change.
Great work!
We'd more than welcome more PRs from you in the future 🚀! I'm guessing that you have an awesome grasp on lots of the mock validator and serializer logic at this point 😁. |
Thanks @sydney-runkle for the throughout review! :)
Noticed that also and the very small difference is coming from the |
Change Summary
Makes
TypeAdapter
to respectdefer_build
so it constructs the core schema on first validation when_defer_build_mode
is set to includetype_adapter
.Related issue number
Partly related to #6768 but this does not fix the root performance issue. But allows
defer_build
to work with FastAPI (which heavily relies onTypeAdapter
+Annotated
under the hood).Checklist
Selected Reviewer: @samuelcolvin