Fix TypeAdapter to respect defer_build #8939

MarkusSintonen · 2024-03-04T07:50:35Z

Change Summary

Makes TypeAdapter to respect defer_build so it constructs the core schema on first validation when _defer_build_mode is set to include type_adapter.

Related issue number

Partly related to #6768 but this does not fix the root performance issue. But allows defer_build to work with FastAPI (which heavily relies on TypeAdapter+Annotated under the hood).

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @samuelcolvin

codspeed-hq · 2024-03-04T08:45:12Z

CodSpeed Performance Report

Merging #8939 will degrade performances by 12.43%

_{Comparing MarkusSintonen:type-adapter-defer-build (8721c0c) with main (d77a940)}

Summary

❌ 1 (👁 1) regressions
✅ 12 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`MarkusSintonen:type-adapter-defer-build`	Change
👁	`test_efficiency_with_highly_nested_examples`	4.1 ms	4.7 ms	-12.43%

MarkusSintonen · 2024-03-04T08:56:16Z

please review

MarkusSintonen · 2024-03-05T10:41:36Z

With this our FastAPI initialization drops from ~40s to ~10s. Where core schema generation takes ~3-4s. This requires using defer_build in all the relevant places. This moves the core schema overhead to when API gets called. In real life its mostly fine as not all the APIs are always active. Its ofcourse much nicer that a single test starts up much faster.

sydney-runkle · 2024-03-05T14:15:58Z

Hmm, I'll take an in-depth look shortly, but my initial thought here was that we weren't planning on adding support for defer_build with TypeAdapter... #8716

MarkusSintonen · 2024-03-05T19:11:12Z

Hmm, I'll take an in-depth look shortly, but my initial thought here was that we weren't planning on adding support for defer_build with TypeAdapter... #8716

Thanks! That is interesting. I'm wondering why in the referenced issue it was suggested to use defer_build in all relevant models. But it doesn't have any chance at working with FastAPI based applications as the config gets ignored with TypeAdapters. It felt like a shortsight to me that TypeAdapter doesn't respect the lazy core schema building. Its a bit unfortunate that there is no way around the core schema performance issues when application starts. Currently it takes about 1minute in production to build the core schemas on startup and it greatly degrades Kubernetes auto scaler from working properly. Sitatution even gets worse as new features get added as we get more overhead from core schemas. 😔

MarkusSintonen · 2024-03-06T06:34:36Z

If there are some reservations about TypeAdapter supporting lazy core schema building like the BaseModels, then you could add a similar environment variable like suggested here #6768 (comment) (which unfortunately didn't work). So eg PYDANTIC_TYPE_ADAPTER_RESPECT_DEFER_BUILD=true which could be used to fix the performance issues related to TypeAdapter usage.

You can see here how our service startup time is going up as time goes by (additional data models and API models added). Now its already over a minute where the core schemas building (in TypeAdapters) takes about a minute:

With Pydantic V1 the startup took about 10 seconds so the issue is getting worse with V2.

MarkusSintonen · 2024-03-16T17:03:46Z

Hmm, I'll take an in-depth look shortly, but my initial thought here was that we weren't planning on adding support for defer_build with TypeAdapter... #8716

@sydney-runkle any chance on allowing TypeAdapter to respect deferred building? Could we allow it via an environment flag as I suggested above? The slowness caused by core schemas is getting out of hand.

cc @samuelcolvin

sydney-runkle · 2024-03-17T12:12:52Z

@MarkusSintonen,

Thanks for the ping, and sorry for the delay! I'll bring this up in our standup meeting on Monday and get back to you then!

samuelcolvin · 2024-03-17T12:41:59Z

We'll discuss next week, but if it's helping you a lot and is opt in, I'm 👍.

MarkusSintonen · 2024-03-17T16:25:09Z

Thank you! This would indeed help. Atleast until there are optimizations/caching added to the CoreSchema generation. But probably implementing those are not happening in very near future

MarkusSintonen · 2024-03-17T19:04:52Z

We'll discuss next week, but if it's helping you a lot and is opt in, I'm 👍.

I now made it an opt-in feature via the usual config object (instead of magical global flags as I originally suggested). See 7deb045

Also documented the current and the opt-in behaviour there.

sydney-runkle · 2024-03-18T14:11:44Z

We'll discuss next week, but if it's helping you a lot and is opt in, I'm 👍.

@MarkusSintonen, we chatted this morning - let's move forward with this 🚀. I know you've implemented support for this as an opt in feature. I think we'll want to add some documentation explaining that this is experimental, and subject to change. Ultimately, a better solution will be to have TA build times improve significantly so that your changes aren't super necessary.

I'll review thoroughly this afternoon :)

sydney-runkle

I'm going to take a closer look at the logic changes in type_adapter.py, but here's some general feedback on the new API :).

Thanks for your great work on this thus far!

pydantic/_internal/_config.py

pydantic/config.py

tests/test_type_adapter.py

pydantic/config.py

tests/test_type_adapter.py

pydantic/type_adapter.py

MarkusSintonen · 2024-03-19T20:22:06Z

I'm going to take a closer look at the logic changes in type_adapter.py, but here's some general feedback on the new API :).

I still refactored it a bit into a smaller property functions with @cached_property which is much cleaner. (Only now remembered it was available already in Python3.8 this supports)

…Model

…test. Fix with main change

…ing. Add more missing tests.

sydney-runkle

Thanks for your hard work on this.

Left some comments ranging from more broad to nitpicky change requests!

I think it makes sense for us to go ahead with the 2.7 beta release given that we still have more work to do on this PR (and there's lots of fixes in that release that we want to go ahead and get out), but I'm happy to continue to work closely with you to get this across the line!

pydantic/_internal/_generate_schema.py

pydantic/_internal/_mock_val_ser.py

sydney-runkle · 2024-04-02T18:59:38Z

pydantic/_internal/_mock_val_ser.py

+    def __contains__(self, key: Any) -> bool:
+        return self._get_built().__contains__(key)
+
+    def __getitem__(self, key: str) -> Any:
+        return self._get_built().__getitem__(key)
+
+    def __len__(self) -> int:
+        return self._get_built().__len__()
+
+    def __iter__(self) -> Iterator[str]:
+        return self._get_built().__iter__()
+
+    def _get_built(self) -> CoreSchema:
+        if self._built_memo is not None:
+            return self._built_memo
+
+        if self._attempt_rebuild:
+            schema = self._attempt_rebuild()
+            if schema is not None:
+                self._built_memo = schema
+                return schema
+        raise PydanticUserError(self._error_message, code=self._code)


Are these necessary?

What do you mean by necessary? For abstract Mapping the __getitem__ / __len__ / __iter__ are required

Ill remove the __contains__ as it doesnt need overriding.

Done, removed the unneeded __contains__ override

I guess I mean they aren't implemented for MockValSer, right? So why have them here?

CoreSchema is a dict but eg SchemaValidator is an ordinary class

tests/test_config.py

tests/test_main.py

tests/test_type_adapter.py

pydantic/type_adapter.py

sydney-runkle · 2024-04-02T19:20:39Z

pydantic/type_adapter.py

+def _frame_depth(depth: int) -> Callable[[Callable[..., R]], Callable[..., R]]:
+    def wrapper(func: Callable[..., R]) -> Callable[..., R]:
+        @wraps(func)
+        def wrapped(self: TypeAdapter, *args: Any, **kwargs: Any) -> R:
+            # depth + 1 for the wrapper function
+            with self._with_frame_depth(depth + 1):
+                return func(self, *args, **kwargs)
+
+        return wrapped
+
+    return wrapper
+
+


I think maybe it'd help to have a bit more documentation for this function and the frame depth function attached to the TypeAdapter class - it'd help me in my review as well!

Done, added some comment to this.

FYI Im wondering is there a better way than the _parent_depth handling. What about requiring user to define a callback function that would resolve the forward refs from locals/globals? So replacing _parent_depth with something like resolve_namespace: Callable[[], dict[str, Any]] where the str is the name of the type. Then it wouldnt be as fragile as the parent depth handling which could be still there but callable being preferred. Probably such resolver would need to come via Config instead than from the constructor arg directly.

This is ofcourse out of scope of this PR.

I definitely wish we had a better way for users to specify/pass the namespace, but the reason for the parent depth thing was so that it would generally behave correctly without any extra work. At this point, I think it's probably not possible to get rid of _parent_depth unless we find a way to keep all existing code that currently relies on that from breaking. In v3 I think we could make a change to this if we felt it simplified things significantly or otherwise had (even indirect) import-time performance benefits

Yeah no way to get rid of parent depth handling but some better (optional) way could be offered via configs to resolve the names. As working of the parent depth handling highly depends on the context on how model happens to be used

dmontagu

This is quite impressive. I have the usual fear that there may be some surprises lurking but it generally seems quite solid. The "changes" to the behavior (such as not erroring when you access the core schema if it needs a rebuild, but only when you try to use it in some way) seem fine to me, so deep in the internals that I'm not very concerned. (Until someone reports a bug this causes... 🙂)

MarkusSintonen · 2024-04-03T14:37:16Z

Thanks for your hard work on this.

Left some comments ranging from more broad to nitpicky change requests!

I think it makes sense for us to go ahead with the 2.7 beta release given that we still have more work to do on this PR (and there's lots of fixes in that release that we want to go ahead and get out), but I'm happy to continue to work closely with you to get this across the line!

Thank you for the throughout review! Did the new round of changes

MarkusSintonen · 2024-04-03T16:28:46Z

The "changes" to the behavior (such as not erroring when you access the core schema if it needs a rebuild, but only when you try to use it in some way) seem fine to me, so deep in the internals that I'm not very concerned. (Until someone reports a bug this causes... 🙂)

Yes that was also my conclusion. It's so deeply internal that no one should rely on it (until they did). I also feel like it's now more consistent with rest of MockValSer behavior with lazy building.

MarkusSintonen · 2024-04-04T05:20:27Z

pydantic/type_adapter.py

+        if not self._defer_build():
+            # Immediately initialize the core schema, validator and serializer
+            with self._with_frame_depth(1):  # +1 frame depth for this __init__
+                # Model itself may be using deferred building. For backward compatibility we don't rebuild model mocks
+                # here as part of __init__ even though TypeAdapter itself is not using deferred building.
+                self._init_core_attrs(rebuild_mocks=False)


This is a bit fuzzy to me still. Core schema and friends are not initialized immediately (rebuild_mocks=False) when _type is subclass of BaseModel even though _defer_build_mode does not include type_adapter. This is how it also behaved previously and we probably can not really change it (can not do rebuild_mocks=True here). Previously it built the core schema right away but it was never used for validation/serialization.

Still the case where it is very inconsistent is when BaseModel is inside Annotated[BaseModel, Field]. I feel like this is a bit strange behavior even in current state of PR. So the Annotated wrapping case suddenly requires including the type_adapter to _defer_build_mode even though it includes model. The behavior is quite a beast to document properly (as its slight weird). It also means the only reasoning for having the new _defer_build_mode is for the Annotated-BaseModel type which feels slightly off. Because any other case has previously required passing the Config explicitly to the TypeAdapter.__init__ with possible defer built enabled.

I would say most clear way would be to just drop the _defer_build_mode and have just the existing boolean, which feels much simpler. As the only real reason for having it now is for the Annotated case. But then again there were some concerns with this. Although I'm not 100% convinced they are actually concerns for Annotated BaseModel usage cases unless I'm missing something. Because also previously the model was defer built via TypeAdapter but not Annotated one for some reason. Ie what is special about Annotated-BaseModel?

I don't follow, but I don't think it's a huge deal if the handling of Annotated[BaseModel, Field] is less performant. If it's invalid then I guess that's more of a concern.

I don't really understand the alternative you are proposing (i.e., dropping the _defer_build_mode), what is the consequence? If it's not hard to make that change on a separate branch that I could compare against this that would probably help a lot. Also, related, I think I don't really mind too much if it's not documented clearly why things work the way they do as long as changes result in failing tests, if necessary to write things in a weird way then ideally there would be a test included specifically for the sake of documenting why things seem to be written in that weird way, and the test could reference the code or vice versa.

but I don't think it's a huge deal if the handling of Annotated[BaseModel, Field] is less performant

The Annotated inconsistency here is actually the root of the issue 😄 As this kind of pattern is heavily used by eg FastAPI with TypeAdapters. All of the Pydantic models there go through this TypeAdapter(Annotated[BaseModel, Field]). So the defer_builld=True param hasnt worked at all in that context. (Causing the mentioned performance issues eg with auto scalers)

If it's invalid then I guess that's more of a concern

More about backward compatibility (but not in FastAPI context) if someone happens to rely on the fact that case like TypeAdapter(Annotated[BaseModel, Field]) gets immediately built eventhough BaseModel used defer_build=True. It feels again kinda deep internal detail how it happened to work previously.
I dont think there are any other concerns than the Annotated case. Because every other case follows more explicit pattern like TypeAdapter(Dict[str, Any], config={"defer_build": True)) (which didnt either work previously)

(i.e., dropping the _defer_build_mode), what is the consequence?

Consequence is that the TypeAdapter starts to "respect" the defer_build=True without further action from the user. Ie the lazy building behaviour matches that of BaseModel in Annotated/"plain-type" case. Currently _defer_build_mode feels like only useful for enabling the Annotated case to be deferred built without the risk of someone happening to rely on the internal detail on how previously worked.

This is how it would look without the additional config parameter: https://github.com/MarkusSintonen/pydantic/compare/type-adapter-defer-build...MarkusSintonen:pydantic:type-adapter-defer-build-actually?expand=1

MarkusSintonen · 2024-04-04T05:30:01Z

pydantic/_internal/_mock_val_ser.py

+        self._error_message = error_message
+        self._code: PydanticErrorCodes = code
+        self._attempt_rebuild = attempt_rebuild
+        self._built_memo: CoreSchema | None = None


I used _built_memo memoizer here to avoid cases where user could capture the internal mocker class and then use it. That would again and again go through the model rebuilder which has deeply the core schema etc memoized after its built. Should the existing MockValSer for consistency also have its memoizer so it wont accidentally go again and again through the wrapped class deep memoizer?

pydantic/_internal/_mock_val_ser.py

pydantic/type_adapter.py

sydney-runkle · 2024-04-19T01:01:07Z

@MarkusSintonen,

Going to do a 2.7.1 patch release soon (Friday or Monday), then would love to get this merged. Will ping you if we have any additional questions before merging :).

sydney-runkle

@MarkusSintonen,

Really appreciate your work on this - the thorough testing and many rounds of iteration are certainly quite appreciated!!

Going ahead and merging this now - I've acknowledged the one schema building behchmark that experienced a bit of a regression. I'd love to see that improve again before we release 2.8, but I'm not particularly worried about the magnitude of the change.

Great work!

sydney-runkle · 2024-04-28T21:57:50Z

@MarkusSintonen,

We'd more than welcome more PRs from you in the future 🚀! I'm guessing that you have an awesome grasp on lots of the mock validator and serializer logic at this point 😁.

MarkusSintonen · 2024-04-29T05:45:39Z

We'd more than welcome more PRs from you in the future 🚀! I'm guessing that you have an awesome grasp on lots of the mock validator and serializer logic at this point 😁.

Thanks @sydney-runkle for the throughout review! :)

I've acknowledged the one schema building behchmark that experienced a bit of a regression. I'd love to see that improve again before we release 2.8, but I'm not particularly worried about the magnitude of the change.

Noticed that also and the very small difference is coming from the _parent_depth handling that goes through _with_frame_depth. But the difference is so tiny and the benchmark doesn't seem to be about measuring TypeAdapter itself. The benchmark could be updated so it doesnt measure the TypeAdapter internals. Maybe TypeAdapter itself could have its own benchmark (but it would mostly just test how the _parent_depth works).

MarkusSintonen force-pushed the type-adapter-defer-build branch 2 times, most recently from a48b614 to 845780d Compare March 4, 2024 08:42

MarkusSintonen force-pushed the type-adapter-defer-build branch 2 times, most recently from 2a1a443 to ce7909e Compare March 4, 2024 08:55

MarkusSintonen marked this pull request as ready for review March 4, 2024 08:56

pydantic-hooky bot added the ready for review label Mar 4, 2024

pydantic-hooky bot assigned samuelcolvin Mar 4, 2024

MarkusSintonen force-pushed the type-adapter-defer-build branch from 8a10e17 to 38dc679 Compare March 4, 2024 09:33

sydney-runkle added the relnotes-feature label Mar 17, 2024

MarkusSintonen force-pushed the type-adapter-defer-build branch 3 times, most recently from f8d413f to 3316871 Compare March 17, 2024 20:37

sydney-runkle requested changes Mar 19, 2024

View reviewed changes

pydantic-hooky bot added awaiting author revision and removed ready for review labels Mar 19, 2024

pydantic-hooky bot assigned MarkusSintonen and unassigned samuelcolvin Mar 19, 2024

MarkusSintonen added 4 commits March 27, 2024 22:23

Do not build duplicated core schemas. Check _defer_build_mode in Base…

f529434

…Model

Memoize in mock to avoid repeated deep calls to deeper memoizer

dc9b069

Remove dead code. Check core schema directly now

ca57009

Move frame depth handling to internal decorator. Better parametrized …

d6420f4

…test. Fix with main change

MarkusSintonen force-pushed the type-adapter-defer-build branch from 56f309d to d6420f4 Compare March 27, 2024 20:37

MarkusSintonen requested a review from sydney-runkle March 31, 2024 11:22

MarkusSintonen force-pushed the type-adapter-defer-build branch 2 times, most recently from 02ba59f to b1a0518 Compare April 1, 2024 10:30

Do not expose MockValSer from TypeAdapter. Simplify frame depth handl…

44d8b42

…ing. Add more missing tests.

MarkusSintonen force-pushed the type-adapter-defer-build branch from c90363d to 44d8b42 Compare April 1, 2024 18:20

sydney-runkle reviewed Apr 2, 2024

View reviewed changes

dmontagu approved these changes Apr 2, 2024

View reviewed changes

Code review fixes. Use common rebuild funcs, move helper to helpers side

855a3f9

MarkusSintonen force-pushed the type-adapter-defer-build branch from cd18f3f to 855a3f9 Compare April 3, 2024 14:34

MarkusSintonen requested a review from sydney-runkle April 3, 2024 14:36

MarkusSintonen commented Apr 4, 2024

View reviewed changes

Fix comments. Add TODO. Better fn name

8721c0c

sydney-runkle reviewed Apr 4, 2024

View reviewed changes

pydantic/_internal/_mock_val_ser.py Show resolved Hide resolved

sydney-runkle reviewed Apr 4, 2024

View reviewed changes

pydantic/type_adapter.py Show resolved Hide resolved

MarkusSintonen requested a review from sydney-runkle April 9, 2024 11:41

sydney-runkle approved these changes Apr 28, 2024

View reviewed changes

sydney-runkle reviewed Apr 28, 2024

View reviewed changes

sydney-runkle merged commit 44815be into pydantic:main Apr 28, 2024
53 checks passed

Viicos mentioned this pull request Apr 30, 2024

Fix typing of _frame_depth #9353

Merged

Fix TypeAdapter to respect defer_build #8939

Fix TypeAdapter to respect defer_build #8939

Conversation

MarkusSintonen commented Mar 4, 2024 • edited

Change Summary

Related issue number

Checklist

codspeed-hq bot commented Mar 4, 2024 • edited

Merging #8939 will degrade performances by 12.43%

Summary

Benchmarks breakdown

MarkusSintonen commented Mar 4, 2024

MarkusSintonen commented Mar 5, 2024 • edited

sydney-runkle commented Mar 5, 2024

MarkusSintonen commented Mar 5, 2024 • edited

MarkusSintonen commented Mar 6, 2024 • edited

MarkusSintonen commented Mar 16, 2024 • edited

sydney-runkle commented Mar 17, 2024

samuelcolvin commented Mar 17, 2024

MarkusSintonen commented Mar 17, 2024

MarkusSintonen commented Mar 17, 2024 • edited

sydney-runkle commented Mar 18, 2024

sydney-runkle left a comment

Choose a reason for hiding this comment

MarkusSintonen commented Mar 19, 2024 • edited

sydney-runkle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarkusSintonen Apr 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmontagu left a comment

Choose a reason for hiding this comment

MarkusSintonen commented Apr 3, 2024

MarkusSintonen commented Apr 3, 2024

MarkusSintonen Apr 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarkusSintonen Apr 4, 2024 • edited

Choose a reason for hiding this comment

MarkusSintonen Apr 4, 2024 • edited

Choose a reason for hiding this comment

sydney-runkle commented Apr 19, 2024

sydney-runkle left a comment

Choose a reason for hiding this comment

sydney-runkle commented Apr 28, 2024

MarkusSintonen commented Apr 29, 2024

MarkusSintonen commented Mar 4, 2024 •

edited

codspeed-hq bot commented Mar 4, 2024 •

edited

MarkusSintonen commented Mar 5, 2024 •

edited

MarkusSintonen commented Mar 5, 2024 •

edited

MarkusSintonen commented Mar 6, 2024 •

edited

MarkusSintonen commented Mar 16, 2024 •

edited

MarkusSintonen commented Mar 17, 2024 •

edited

MarkusSintonen commented Mar 19, 2024 •

edited

MarkusSintonen Apr 3, 2024 •

edited

MarkusSintonen Apr 4, 2024 •

edited

MarkusSintonen Apr 4, 2024 •

edited

MarkusSintonen Apr 4, 2024 •

edited