Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotate format checker methods #958

Merged
merged 3 commits into from Jun 2, 2022

Conversation

sirosen
Copy link
Member

@sirosen sirosen commented May 28, 2022

The goal of this work is to apply annotations in jsonschema._format which match the new additions to the type stubs recently added to typeshed.

In short, it defines a _FormatCheckCallable as a callable -- typically function -- which takes any object and returns a bool. All of the is_* functions are _FormatCheckCallables, and the decorators are carefully annotated as being type-preserving using a typevar.

Because the returned objects from these functions are not always bools, this changeset now calls bool() on those which are not (e.g. ipaddress.IPv4Address). This is really just an explicit form of the check which is going to happen in conforms and check anyway, so there's no significant new cost. The advantage of this is that we have documented (via the annotations) what a format check function is supposed to do: it returns a bool. We could equally well return Any from these functions, relying on __bool__, but this could confuse new contributors and users.

One unfortunate side effect of these changes is that FormatChecker.cls_checks needs to be expanded into a full duplicate of FormatChecker.checks. mypy isn't able to properly understand what cls_checks = classmethod(checks) does -- this is part of a class of semi-sophisticated callable and method manipulations that are known to be problematic for mypy. In order to get the annotations correct, the simplest solutions are either to annotate it explicitly (cast or type comment) or to expand it as this changeset has done.


When I first saw return ipaddress.IPv4Address(...), I thought it was a bug. Because all of these methods "look like" they return bools, it was confusing to see one which returns anything else. I was excited for a hot second that we'd have our first type-annotation driven bugfix! But then, of course, I realized that __bool__ behavior protects us from any issues. I think that does give some credence to the idea that this clarifies the behavior, FWIW.

@codecov-commenter
Copy link

Codecov Report

Merging #958 (00b33bb) into main (acc3ed2) will increase coverage by 0.00%.
The diff coverage is 95.74%.

@@           Coverage Diff           @@
##             main     #958   +/-   ##
=======================================
  Coverage   96.88%   96.89%           
=======================================
  Files          20       20           
  Lines        3276     3284    +8     
  Branches      449      449           
=======================================
+ Hits         3174     3182    +8     
  Misses         79       79           
  Partials       23       23           
Impacted Files Coverage Δ
jsonschema/_format.py 94.19% <95.74%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update acc3ed2...00b33bb. Read the comment docs.

@sirosen
Copy link
Member Author

sirosen commented May 30, 2022

There are two failures I see in the build, one of which is tox -e style, which I will fix shortly. The other is harder to handle. If I'm reading the build logs correctly, all doc builds are now failing because sphinx is picking up on the "private" type var _F, trying to use that in generated doc, and then not being able to link it and failing nitpick.

I'm not sure how best to resolve. We need the type var to correctly annotate these decorators because that is how we instruct type-checkers that the decorator does not change the type of the decorated function. But the type var is not a symbol which we want to export and have jsonschema users using.

I think one option would be to rename the var to F and then tell nitpick to ignore the bare string F. That would put checks(...) -> Callable[[F], F] into the docs with no explanation of what F is.
I think sophisticated typing users (i.e. anyone who has used a TypeVar themselves) would quickly and easily recognize that a single capital letter is a type var.

The goal of this work is to apply annotations in `jsonschema._format`
which match the new additions to the type stubs recently added to
typeshed.

In short, it defines a `_FormatCheckCallable` as a callable --
typically function -- which takes any object and returns a bool.
All of the `is_*` functions are `_FormatCheckCallable`s, and the
decorators are carefully annotated as being type-preserving using a
typevar.

Because the returned objects from these functions are not always
bools, this changeset now calls `bool()` on those which are not (e.g.
`ipaddress.IPv4Address`). This is really just an explicit form of the
check which is going to happen in `conforms` and `check` anyway, so
there's no significant new cost. The advantage of this is that we have
documented (via the annotations) what a format check function is
supposed to do: it returns a bool. We could equally well return `Any`
from these functions, relying on `__bool__`, but this could confuse
new contributors and users.

One unfortunate side effect of these changes is that
`FormatChecker.cls_checks` needs to be expanded into a full duplicate
of `FormatChecker.checks`. `mypy` isn't able to properly understand
what `cls_checks = classmethod(checks)` does -- this is part of a
class of semi-sophisticated callable and method manipulations that are
known to be problematic for `mypy`. In order to get the annotations
correct, the simplest solutions are either to annotate it explicitly
(cast or type comment) or to expand it as this changeset has done.
@Julian
Copy link
Member

Julian commented May 31, 2022

Hey! Thanks. Will have a look in another day or two, been a bit busy with other things, but appreciated! (And yeah will offer some more educated opinion on the Sphinx thing then hopefully).

@Julian
Copy link
Member

Julian commented Jun 2, 2022

Because the returned objects from these functions are not always bools, this changeset now calls bool() on those which are not (e.g. ipaddress.IPv4Address). This is really just an explicit form of the check which is going to happen in conforms and check anyway, so there's no significant new cost.

Bleh. OK, fair enough I guess. These functions, as you note, have 2 different signatures -- the "public API" version of what you give to checks is as you say, object -> bool, but the internal signature is object -> any (internal meaning anyone working on jsonschema itself may treat it as such as an implementation detail). These sorts of things (this and what you note about cls_checks) are precisely what I expect(ed) typing to make annoying -- cases where everything works fine and is well-tested, but the type checker isn't smart enough.

Willing still to ignore this as we continue but yeah hoping we don't end up with death by a thousand type checker dumbness paper cuts :) (honestly though I'm sort of expecting that -- the library is pretty well tested, so I'm not expecting type checking to reveal any bugs, I'm simply expecting it to help improve documentation, and the above signature for .checks is indeed poorly documented, so that's a small win).

@Julian
Copy link
Member

Julian commented Jun 2, 2022

I'm not sure how best to resolve.

Let's perhaps try sphinx-autodoc-typehints which seems to resolve the nitpick here. I pushed a commit, let's see if the rest of CI passes.

* main:
  Slightly speed up pip installs by skipping the version check in CI.
  Remove the now-unused MANIFEST.in.
  v4.6.0 -> CHANGELOG
  Ignore the badge URLs, they seem super unreliable from CI.
  Ignore a deprecation warning coming from pkg_resources on 3.11
  Add basic CONTRIBUTING guidelines.
  Make project.urls be valid URLs.
  Combine the CI and packaging workflows.
  Let RTD be authoritative about what the default doc version is.
  Re-enable Python 3.11 testing in CI.
  Modernize the packaging setup via PEP 621 and Hatch.
  Update various GHA versions.
  Update docs requirements.
  Revert "Merge pull request python-jsonschema#954 from ssbarnea/fix/py.typed"
@sirosen
Copy link
Member Author

sirosen commented Jun 2, 2022

These functions, as you note, have 2 different signatures -- the "public API" version of what you give to checks is as you say, object -> bool, but the internal signature is object -> any (internal meaning anyone working on jsonschema itself may treat it as such as an implementation detail). These sorts of things (this and what you note about cls_checks) are precisely what I expect(ed) typing to make annoying -- cases where everything works fine and is well-tested, but the type checker isn't smart enough.

FWIW, even as a big proponent of typing for new projects (where you suffer these issues very gradually in exchange for powerful linting on less thoroughly tested code), I find this stuff annoying.

Still, I want to try to push through. Having type annotations published by the library will be valuable if it's ever necessary to change the API in v5.0 or later. mypy can then flag a variety of signature changes for users at their usage sites.

There are two other solutions we can take for the bool() calls. One I dislike because I think it's wrong, and the other I am willing to pursue if it sounds better but I'd need to research a bit.

  1. Tell a white lie in the types with typing.cast (I do not like this)

Even though the functions are annotated as object -> bool, we can still return any but cast it with typing.cast(bool, ret_val).
This makes the types happy, but is just as in the way as the explicit bool call. Plus, it causes is_ipv4("...") is False to type-check when it will not behave correctly at runtime.

  1. Make the return type bool | _ImplementsBool

If we add some special type like

class _ImplementsBool(typing.Protocol):
    def __bool__(self) -> bool: ...

it might be possible to explain the situation to mypy and other checkers without any other changes.

Then we could remove the bool() calls and trust type checkers to reason that the only thing we care about is that the returned object can be bool()ed in a conditional.

This is a behavior I'd like typing to provide out of the box, and for all I know there's some special type somewhere like typing_extensions which can do this. However, I'm not sure that type checkers will understand _ImplementsBool because object.__bool__ is defined. I'd have to try this out.

@Julian
Copy link
Member

Julian commented Jun 2, 2022

Will the second one work? Don't forget there's other ways than implementing __bool__ to get if foo: to work, e.g. defining __len__ to return 0.

@Julian
Copy link
Member

Julian commented Jun 2, 2022

Agree and on-board with what you started with though btw! Appreciated.

@sirosen
Copy link
Member Author

sirosen commented Jun 2, 2022

Will the second one work? Don't forget there's other ways than implementing __bool__ to get if foo: to work, e.g. defining __len__ to return 0.

I think it might work! 🤣

I don't want to drag jsonschema through complicated typing-hacks and problems, so this sort of stuff is always going to be a bit of a balancing act between not touching any runtime logic -- on the assumption that it's all correct -- and doing weird complicated stuff.

I think I might come down, after seeing _ImplementsBool written out (it was only in my head before), on the side of just calling bool() like this PR does. It's just one fewer thing to maintain and worry about.


Looks like that autodoc-typehints extension worked! Or, at least, it passed the build.

@Julian
Copy link
Member

Julian commented Jun 2, 2022

OK cool sounds good to me :D

Let me give this a quick once over and then will merge then.

@Julian Julian merged commit b1c1d00 into python-jsonschema:main Jun 2, 2022
@sirosen sirosen deleted the bool-format-checks branch June 2, 2022 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants