Skip to content

Commit

Permalink
add implementation details and more suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
samuelcolvin committed Jul 7, 2022
1 parent 5bbea1f commit f0043eb
Showing 1 changed file with 128 additions and 48 deletions.
176 changes: 128 additions & 48 deletions docs/blog/pydantic-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,61 +37,52 @@ like [Salesforce did](https://twitter.com/samuel_colvin/status/15012882476700631
This is not charity, recruitment or marketing - the argument should be about how much the company will save if
pydantic is 10x faster, more stable and more powerful - it would be worth paying me 10% of that to make it happen.

The plan is to have pydantic V2 released within 3 months of full-time work
(again, that'll be sooner if I can continue to work on it full-time :face_with_raised_eyebrow:).

Before pydantic V2 can be released, we need to release pydantic V1.10 - there are lots of changes in the main
branch of pydantic contributed by the community, it's only fair to provide a release including those changes,
many of them will remain unchanged for V2, the rest will act as a requirement to make sure pydantic V2 includes
the capabilities they implemented.

The basic road map for me is as follows:

1. Implement a few more critical features in pydantic-core
2. Release V0.1 of pydantic-core
3. Work on getting pydantic V1.10 out - basically merge all open PRs that are finished
4. Release pydantic V1.10
5. Delete all stale PRs which didn't make it into V1.10, apologise profusely to their authors who put their valuable
1. Implement a few more critical features in pydantic-core, see [below](#motivation-pydantic-core)
2. Work on getting pydantic V1.10 out - basically merge all open PRs that are finished
3. Release pydantic V1.10
4. Delete all stale PRs which didn't make it into V1.10, apologise profusely to their authors who put their valuable
time into pydantic only to have their PRs closed :pray:
(and explain when and how they can rebase and recreate their PR)
7. Rename `master` to `main`, seems like a good time to do this
8. Change the main branch of pydantic to target V2
9. Start tearing pydantic code apart and see how many existing tests can be made to pass
10. Rinse, repeat
11. Release pydantic V2 :tada:
5. Rename `master` to `main`, seems like a good time to do this
6. Change the main branch of pydantic to target V2
7. Start tearing pydantic code apart and see how many existing tests can be made to pass
8. Rinse, repeat
9. Release pydantic V2 :tada:

Plan is to have all this done by the end of October, definitely by the end of the year.

## Introduction

Pydantic began life as an experiment in some code in a long dead project.
I ended up making the code into a package and releasing it.
It got a bit of attention on hacker news when it was first released, but started to get really popular when
Sebastián Ramírez used it in [FastAPI](https://fastapi.tiangolo.com/).
## Motivation & `pydantic-core`

Since then, with the help of wonderful contributors like
Since pydantic's initial release, with the help of wonderful contributors
[Eric Jolibois](https://github.com/PrettyWood),
[Sebastián](https://github.com/tiangolo),
and [David Montague](https://github.com/dmontagu) the package and its usage have grown enormously.
[Sebastián Ramírez](https://github.com/tiangolo),
[David Montague](https://github.com/dmontagu) and many others, the package and its usage have grown enormously.
The core logic however has remained relatively unchanged since the initial experiment.
It's old, it smells, it needs to be rebuilt.

The release of version 2 is an opportunity to rebuild pydantic and correct many things that don't make sense -
**to make pydantic amazing :rocket:**.

Much of the work on V2 is already done, but there's still a lot to do.
Now seems a good opportunity to explain what V2 is going to look like and get feedback from users.

## Headlines

For good and bad, here are some of the biggest changes expected in V2.

The core validation logic of pydantic V2 will be performed by a separate package
[pydantic-core](https://github.com/samuelcolvin/pydantic-core) which I've been building over the last few months.

*pydantic-core* is written in Rust using the excellent [pyo3](https://pyo3.rs) library which provides rust bindings
for python.

The motivation for building pydantic-core in Rust is as follows:
1. **Performance**, see [below](#performance)
2. **Recursion and code separation** - with no stack and little or no overhead for extra function calls,
Rust allows pydantic-core to be implemented as a tree of small validators which call each other, without harming performance
3. **Safety and complexity** - pydantic-core is a fairly complex piece of code which has to draw distinctions
between many different errors, Rust is great in situations like this,
it should minimise bugs (:fingers_crossed:) and allow the codebase to be extended for a long time to come

!!! note
The python interface to pydantic shouldn't change as a result of using pydantic-core, instead
pydantic will use type annotations to build a schema for pydantic-core to use.
Expand All @@ -100,11 +91,18 @@ pydantic-core is usable now, albeit with a fairly unintuitive API, if you're int

pydantic-core provides validators for all common data types,
[see a list here](https://github.com/samuelcolvin/pydantic-core/blob/main/pydantic_core/_types.py#L291).
Other, less commonly used data types will be supported via validator functions.
Other, less commonly used data types will be supported via validator functions implemented in pydantic.

See [pydantic-core#153](https://github.com/samuelcolvin/pydantic-core/issues/153)
for a summary of what needs to be completed before its first release.

## Headlines

Here are some of the biggest changes expected in V2.

### Performance :thumbsup:

As a result of the move to rust for the validation logic
As a result of the move to Rust for the validation logic
(and significant improvements in how validation objects are structured) pydantic V2 will be significantly faster
than pydantic V1.

Expand All @@ -124,10 +122,10 @@ pydantic-core comes with "strict mode" built in. With this only the exact data t

This will allow pydantic V2 to offer a `strict` switch which can be set on either a model or a field.

#### `IsInstance` checks :thumbsup:
#### `is_instance` checks :thumbsup:

Strict mode also means it makes sense to provide an `is_instance` method on validators which effectively run
validation then throw away the result while avoiding the (admittedly small) overhead of creating and raising
validation then throws away the result while avoiding the (admittedly small) overhead of creating and raising
and error or returning the validation result.

### Formalised Conversion Table :thumbsup:
Expand All @@ -154,7 +152,7 @@ Some examples of what that means in practice:
| `int` | `b"1"` | :material-close: | :material-close: | Error |

(For the last case converting `bytes` to an `int` could reasonably mean `int(bytes_data.decode())` or
`int.from_bytes(b'1', 'big')`, hence an error)
`int.from_bytes(b'1', 'big/little')`, hence an error)

In addition to the general rule, we'll provide a conversion table which defines exactly what data will be allowed
to which field types. See [the table below](#conversion-table) for a start on this.
Expand Down Expand Up @@ -225,7 +223,7 @@ class Foo(BaseModel):
### Validator Function Improvements :thumbsup: :thumbsup: :thumbsup:

This is one of the changes in pydantic V2 that I'm most excited about, I've been talking about something
like this for a long time, see [#1984](https://github.com/samuelcolvin/pydantic/issues/1984), but couldn't
like this for a long time, see [pydantic#1984](https://github.com/samuelcolvin/pydantic/issues/1984), but couldn't
find a way to do this until now.

Fields which use a function for validation can be any of the following types:
Expand Down Expand Up @@ -308,15 +306,15 @@ I think all 4 modes can be supported in a single implementation, with a kind of
is used to convert the data as the use wishes.

The current `include` and `exclude` logic is extremely complicated, but hopefully it won't be too hard to
translate it to rust.
translate it to Rust.

We should also add support for `validate_alias` and `dump_alias` as well as the standard `alias`
to allow for customising field keys.

### Model namespace cleanup :thumbsup:

For years I've wanted to clean up the model namespace,
see [#1001](https://github.com/samuelcolvin/pydantic/issues/1001). This would avoid confusing gotchas when field
see [pydantic#1001](https://github.com/samuelcolvin/pydantic/issues/1001). This would avoid confusing gotchas when field
names clash with methods on a model, it would also make it safer to add more methods to a model without risking
new clashes.

Expand Down Expand Up @@ -423,7 +421,7 @@ Other binaries can be added provided they can be (cross-)compiled on github acti
If no binary is available from PyPI, pydantic-core can be compiled from source if Rust stable is available.

The only place where I know this will cause problems is Raspberry Pi, which is a
[mess](https://github.com/piwheels/packages/issues/254) when it comes to packages written in rust for python.
[mess](https://github.com/piwheels/packages/issues/254) when it comes to packages written in Rust for python.
Effectively, until that's fixed you'll likely have to install pydantic with
`pip install -i https://pypi.org/simple/ pydantic`.

Expand Down Expand Up @@ -498,6 +496,8 @@ which is a type defining the schema for validation schemas.
pydantic-core schema has full type definitions although since the type is recursive,
mypy can't provide static type analysis, pyright however can.

We can probably provide one or more helper functions to make `__pydantic_schema__` easier to generate.

## Other Improvements :thumbsup:

Some other things which will also change, IMHO for the better:
Expand All @@ -507,18 +507,31 @@ Some other things which will also change, IMHO for the better:
and a validation error is raised
2. The reason I've been so keen to get pydantic-core to compile and run with wasm is that I want all examples
in the docs of pydantic V2 to be editable and runnable in the browser
3. Full (pun intended) support for `TypedDict`, including `full=False` - e.g. omitted keys
3. Full (pun intended) support for `TypedDict`, including `full=False` - e.g. omitted keys,
providing validation schema to a `TypeDict` field/item will use `Annotated`, e.g. `Annotated[str, Field(strict=True)]`
4. `from_orm` has become `from_attributes` and is now defined at schema generation time
(either via model config or field config)
5. `input_value` has been added to each line error in a `ValidationError`, making errors easier to understand,
and more comprehensive details of errors to be provided to end users,
[#784](https://github.com/samuelcolvin/pydantic/issues/784)
7. `on_error` logic in a schema which allows either a default value to be used in the event of an error,
[pydantic#784](https://github.com/samuelcolvin/pydantic/issues/784)
6. `on_error` logic in a schema which allows either a default value to be used in the event of an error,
or that value to be omitted (in the case of a `full=False` `TypeDict`),
[#151](https://github.com/samuelcolvin/pydantic-core/issues/151)
8. `datetime`, `date`, `time` & `timedelta` validation is improved, see the
[speedate] rust library I built specifically for this purpose for more details
9. Powerful "priority" system for optionally merging or overriding config in sub-models for nested schemas
[pydantic-core#151](https://github.com/samuelcolvin/pydantic-core/issues/151)
7. `datetime`, `date`, `time` & `timedelta` validation is improved, see the
[speedate] Rust library I built specifically for this purpose for more details
8. Powerful "priority" system for optionally merging or overriding config in sub-models for nested schemas
9. pydantic will support [annotated-types](https://github.com/annotated-types/annotated-types),
so you can do stuff like `Annotated[set[int], Len(0, 10)]`
10. A single decorator for general usage - we should add a `valdiate` decorator which can be used:
* on functions (replacing `validate_arguments`)
* on dataclasses, `pydantic.dataclasses.dataclass` will become an alias of this
* on `TypedDict`s
* On any supported type, e.g. `Union[...]`, `Dict[str, Thing]`
* On Custom field types - e.g. anything with a `__pydantic_schema__` attribute
11. Easier validation error creation, I've often found myself wanting to raise `ValidationError`s outside
models, particularly in FastAPI
([here](https://github.com/samuelcolvin/foxglove/blob/a4aaacf372178f345e5ff1d569ee8fd9d10746a4/foxglove/exceptions.py#L137-L149)
is one method I've used), we should provide utilities

## Removed Features :neutral_face:

Expand All @@ -528,13 +541,80 @@ Some other things which will also change, IMHO for the better:
3. `TypeError` are no longer considered as validation errors, but rather as internal errors, this is to better
catch errors in argument names in function validators.
4. Subclasses of builtin types like `str`, `bytes` and `int` are coerced to their parent builtin type,
this is a limitation of how pydantic-core converts these types to rust types during validation, if you have a
this is a limitation of how pydantic-core converts these types to Rust types during validation, if you have a
specific need to keep the type, you can use wrap validators or custom type validation as described above
5. [Settings Management](https://pydantic-docs.helpmanual.io/usage/settings/) ??? - I definitely don't want to
remove the functionality, but it's something of a historical curiosity that it lives within pydantic,
perhaps it should move to a separate package, perhaps installable alongside pydantic with
`pip install pydantic[settings]`?

## Features Remaining :neutral_face:

The following features will remain (mostly) changed:

* Generics
* JSONSchema, internally this will need to change a lot, but hopefully the external interface will remain unchanged
* `dataclass` support, again internals might change, but not the external interface
* `validate_arguments`, might be renamed, but otherwise remain

## Questions :question:

I hope the explanation above is useful. I'm sure people will have questions and feedback; I'm aware
I've skipped over some features with limited detail (this post is already fairly long :sleeping:).

To allow feedback without being overwhelmed, I've created a "Pydantic V2" category for
[discussions on github](https://github.com/samuelcolvin/pydantic/discussions/categories/pydantic-v2) - please
feel free to create a discussion if you have any questions or suggestions.
We will endeavour to read and respond to everyone.

---

## Implementation Details :nerd:

(This is yet to be built, so these are nascent ideas which might change)

At the center of pydantic v2 will be a `PydanticValidator` class which looks roughly like this:

```py title="PydanticValidator"
# type identifying data which has been validated,
# as per pydantic-core, this can include "fields_set" data
ValidData = TypeVar('ValidData')

class PydanticValidator:
def __init__(self, output_type: Function | Type | TypingConstruct, config: Config):
...
def validate(self, input_data: Any) -> ValidData:
...
def validate_json(self, input_data: str | bytes | bytearray) -> ValidData:
...
def is_instance(self, input_data: Any) -> bool:
...
def is_instance_json(self, input_data: str | bytes | bytearray) -> bool:
...
def json_schema(self) -> dict:
...
def dump(
self,
data: ValidData,
include: ... = None,
exclude: ... = None,
by_alias: bool = False,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
converter: Callable[[Any], Any] | None = None
) -> str:
...
```

This could be used directly, but more commonly will be used by the following:

* `BaseModel`
* the `validate` decorator described above
* `pydantic.dataclasses.dataclass` (which might be an alias of `validate`)
* generics

## Conversion Table :material-table:

The table below provisionally defines what input value types are allowed to which field types.
Expand All @@ -544,7 +624,7 @@ An updated and complete version of this table will be included in the docs for V
!!!note
Some type conversion shown here are a significant departure from existing behavior, we may have to provide a config
flag for backwards compatibility for a few of them, however pydantic V2 cannot be entirely backward compatible,
see [#152](https://github.com/samuelcolvin/pydantic-core/issues/152).
see [pydantic-core#152](https://github.com/samuelcolvin/pydantic-core/issues/152).

| Field Type | Input | Mode | Input Source | Conditions |
|---------------|-------------|--------|--------------|-----------------------------------------------------------------------------|
Expand Down

0 comments on commit f0043eb

Please sign in to comment.