Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize annotations with python/typeshed#8608 #1027

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sirosen
Copy link
Member

@sirosen sirosen commented Dec 7, 2022

I haven't had time to focus on jsonschema typing much recently, but I want to get things back on track. The first thing to do is to correct any drift between jsonschema and typeshed.
I'm going to try to produce a steady stream of very small PRs which get things back in sync.

This is therefore hopefully the first of a few changes.


python/typeshed#8608 introduced annotations for create which are not fully reflected here.

In order to reflect that state into jsonschema, a new module, jsonschema._typing is introduced. The purpose of _typing is to be a singular place for the library to define type aliases and any typing-related utilities which may be needed. This will let us use aliases like _typing.JsonValue in many locations where any valid JSON datatype is accepted.
The definitions can be refined over time as necessary.

Initial definitions in _typing are:

  • Schema (any JSON object)
  • JsonObject (any JSON object)
  • JsonValue (any JSON value, including objects or sequences)

Schema is just another name for JsonObject. Perhaps it is not needed, but the name may help clarify things to a reader. It is not obvious at present whether or not it is a good or bad idea to notate it as such, but a similar Schema alias is defined in typeshed and seems to be working there to annotate things accurately.

These types are using Mapping and Sequence rather than dict or list. The rationale is that jsonschema's logic does not dictate that the objects used must be defined in stdlib types or subtypes thereof. For example, a collections.UserDict could be used and should be accepted by the library (UserDict wraps but does not inherit from dict.)


📚 Documentation preview 📚: https://python-jsonschema--1027.org.readthedocs.build/en/1027/

python/typeshed#8608 introduced annotations for `create` which are not
fully reflected here.

In order to reflect that state into `jsonschema`, a new module,
`jsonschema._typing` is introduced. The purpose of `_typing` is to be
a singular place for the library to define type aliases and any
typing-related utilities which may be needed. This will let us use
aliases like `_typing.JsonValue` in many locations where any valid
JSON datatype is accepted.
The definitions can be refined over time as necessary.

Initial definitions in `_typing` are:
- Schema (any JSON object)
- JsonObject (any JSON object)
- JsonValue (any JSON value, including objects or sequences)

`Schema` is just another name for `JsonObject`. Perhaps it is not
needed, but the name may help clarify things to a reader. It is not
obvious at present whether or not it is a good or bad idea to notate
it as such, but a similar Schema alias is defined in typeshed and
seems to be working there to annotate things accurately.

These types are using `Mapping` and `Sequence` rather than `dict` or
`list`. The rationale is that `jsonschema`'s logic does not dictate
that the objects used *must* be defined in stdlib types or subtypes
thereof. For example, a `collections.UserDict` could be used and
should be accepted by the library (`UserDict` wraps but does not
inherit from `dict`.)
@sirosen sirosen requested a review from Julian December 7, 2022 15:57
@Julian
Copy link
Member

Julian commented Dec 7, 2022

Thanks! No worries on the time shortage, I hear you!

Very much appreciate the PR, this is a great plan overall. Will leave some minor comments.

(There's also #1022 from @DanielNoord outstanding to look at, which is driven by typing-related concerns, just in case you haven't seen it).

@@ -0,0 +1,14 @@
"""
Type aliases and utilities to help manage `typing` usage and type annotations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Definitely support adding this, thanks!)

from collections.abc import Mapping, Sequence
import typing

Schema = Mapping[str, typing.Any]
Copy link
Member

@Julian Julian Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema is tricky to nail down for multiple reasons. Its precise type depends on the specific dialect of JSON Schema involved, and (speaking off the top of my head without double checking, which counts for lots of the comments I'm about to make --)

  • applicable_validators means that technically its type is literally Any (or some dependent type), because someone can decide they're inventing a JSON Schema dialect where the number 17 is a schema and means something, and applicable_validators will give the library what it needs to deal with that as a schema. I'm honestly not sure what to do about that.
  • Ignoring the above, the "internal" type to all JSON Schema "official" drafts is -- for later drafts, bool | Mapping[str, typing.Any], and for earlier ones Mapping[str, typing.Any]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely going to have to circle back to try to better grasp what's sensible here.

In line with my other comment about maybe accepting some technical inaccuracy in the annotations as a pragmatic approach, I'm instinctively inclined towards bool | Mapping[...].


JsonObject = Mapping[str, typing.Any]

JsonValue = typing.Union[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incomplete -- someone can come along and invent validator callbacks which only work if someone deserialized into Decimal, and as long as they ensure they've done so in their loading of JSON, their code works. In other words, I'm somewhat afraid this is really just Any in "real life".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have a tension here between annotations which are useful vs ones which are completely accurate.

If we annotate things as Any, we ensure that nothing which is correct at runtime would be flagged by a type checker. But I'm concerned that making this accurate with Any will also mean that we can't catch common errors.
My example would be some flask-like framework, and checking validate(request) vs validate(request.json).

Is it okay to suggest that anyone using type checking needs to use typing.cast or type: ignore in these cases?

The type really is Any for a lot of methods like json.load, but I hope my desire to provide a more tightly scoped type is understandable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think about what I think honestly.

To me, all I am "convinced" is useful of at the minute (and I'm quite convinced it's very useful indeed) is that typing forces documentation to have correctly documented types. That's the reason I'm quite picky about making sure that things which work "accidentally" because we abuse some type internally do not in fact get represented as the type -- because later down the line I will not support those internal abuses if we need to change them for whatever reason. They're not part of the public API, they don't have stability guarantees.

Here, the opposite is true -- not documenting the type here actually means "regressing" something that is indeed public API, and which I very much do intend folks to rely on (by which I don't mean I have no reservations that it will bite me in the ass, but I do mean it's something I think people should consider public API at this point, type annotation or not, and I don't intend to break it).

And I know we don't actually regress, since I'm pretty sure there are tests covering the Any behavior at least in part -- but if the goal (at least the goal in my head) is "be able to say every object in jsonschema has its type documented explicitly", it complicates that goal if the documentation again needs tweaking to broaden types into their real public API.

(So... "I don't know yet" :D)

@@ -114,13 +120,15 @@ def _store_schema_list():


def create(
meta_schema,
validators=(),
meta_schema: _typing.Schema,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to double check (which I will be lazy and not do here) but Unlike _typing.Schema, meta-schemas (in the library, but possibly also according to the spec, I forget) must be objects I think, not bool | thing, so you'll need two types I suspect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. I need to think harder about what to do with schemas more broadly, but we now have a clean place where we can define _typing.MetaSchema. I'll make that tweak.

meta_schema,
validators=(),
meta_schema: _typing.Schema,
validators: Mapping[str, _ValidatorCallback] | tuple[()] = (),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API is just Mapping[str, _ValidatorCallback], not tuple. I know mypy or whatever is unhappy with that, so if that means just adding type: ignore to get it to shut up about the default arg let's do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm cool with that way of handling these!

I had some other ideas which I like less, but another option just occured to me.

With _typing we could do this:

# in the definition
validators: Mapping[str, _ValidatorCallback] = _typing.EmptySequence

# in _typing
EmptySequence = typing.cast(typing.Any, ())

Because Any is a subtype of everything, the rules get wonky, and mypy and other checkers should be okay with this.
That way, _typing.EmptySequence can be used in any location where the empty tuple is used today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems also fine with me (although I think the fact that () works is more a quirk of dict than a generic empty sequence type, but I'll leave pedantry out until/unless it comes back to rear its head I think).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it help that I also considered

EmptyMap = pyrsistent.freeze({})

?
😁

I just know that this pattern occurs several times in the code, with () as an empty default. So it seems like it might be nice to sweep any messiness associated with it into _typing.

format_checker: _format.FormatChecker = _format.draft202012_format_checker,
id_of: typing.Callable[[_typing.Schema], str] = _id_of,
applicable_validators: typing.Callable[
[_typing.Schema], typing.Iterable[tuple[str, _ValidatorCallback]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This has to do with the above, the type for applicable_validators is really Any -> tuple(...))

@DanielNoord
Copy link
Contributor

Nice to see you have found some more time this!

One question though, why did you choose to make the module private? A lot of the typing ends up in the public API as well, so users would then need to import from a private module to make their own annotations be correct.

@sirosen
Copy link
Member Author

sirosen commented Dec 7, 2022

One question though, why did you choose to make the module private? A lot of the typing ends up in the public API as well, so users would then need to import from a private module to make their own annotations be correct.

I appreciate the question -- and it's a bit of a balancing act, IMO.
It depends a great deal on what goes into that module, and how you want to make implementation vs interface decisions.

For example, we have in this PR

_ValidationCallback = typing.Callable[...]

being used in validators.py.

Is that particular type alias a public property of the library? I would say no -- and I think my inclination not to expose that particular type is agreeable to most developers. It's an internal convenience for authors here to be able to write things more clearly and succinctly, just like any other _-prefixed attribute or function in python.

What about _typing.JsonValue or _typing.Schema? That's more debatable. It might eventually be desirable to pull names into jsonschema.typing, so that we can write client code like

validate(typing.cast(jsonschema.typing.JsonValue, my_object))

But that grows the scope of this changeset! Suddenly, I'm not just trying to make some annotations accurate, but I'm also proposing a new public API which jsonschema publishes and maintains! I didn't feel comfortable with that scope creep. I think the question of what types and type aliases, protocols, generics, etc jsonschema publishes should be kept separate from how its own annotations are written.

I hope that makes sense -- and I do intend to get you a good reply on some other threads (notably #1017)! But until I do, thanks for the interest in and contributions towards a well-typed jsonschema! 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants