Tests specifically for tricky, white-lie signatures #1339

gnprice · 2017-05-24T22:17:32Z

We've had some useful, long discussions about writing tests for stubs (#754, #917). Most of the discussion has been around having tests for the bulk of the functionality in stubs and the libraries they describe. I think we're all broadly in agreement that

such comprehensive tests would be valuable; but
just writing out a test for each function and class stubbed isn't helpful, because it amounts to repeating what we're already saying in the stubs; and
a higher-tech approach to testing the bulk of the stubs' content won't be trivial to get to work with a good signal-to-noise ratio. (It'll be great if it can be done; some folks are working on it, and others are skeptical it can.)

This issue is for a narrower proposal: sometimes when a library's real signature is something complex or hard to express in the type system, our signature ends up pretty tricky or we end up telling sort of a white lie to the type-checkers. For these, even when looking right at the signature in one window and the library's docs (or an experiment in the REPL) in the other, it can be hard to tell whether they match. In these cases, I think a hand-written test would be useful -- probably looking a lot like a mypy unit test, with some code to be type-checked and an indication that certain lines should have errors and the rest shouldn't, and then we could e.g. run mypy and pytype against it.

@JukkaL suggested just this in one bullet point of a comment on #754, and gave some examples:

Add a test that exercises particularly complex signatures. Some recent examples where this could have been helpful are dict.get (a recent change to it broke it on mypy) and requests.api (there was a bad signature).

A new example today is #1331: auto doesn't really inherit from IntFlag, and we may find a new way of refining how we describe that interaction in the stubs. When we do it'll be useful to have a test for the functionality that that little lie in the stubs is intended to provide.

The text was updated successfully, but these errors were encountered:

gnprice · 2017-05-24T22:21:19Z

/cc @JukkaL @gvanrossum @vlasovskikh @matthiaskramm @JelleZijlstra @ambv from previous discussions

JelleZijlstra · 2017-05-24T22:25:04Z

I like this idea. This would interact nicely with python/typing#436: the same test runner proposed there could run tests in typeshed.

matthiaskramm · 2017-05-25T03:12:07Z

I'm ±0 on this.

Pro: Recently, we've had a number of pull requests with commit messages along the lines of
"Make [some example code] work with typeshed". Clearly, it would have been good to put [some example code] somewhere more meaningful.

Contra: It's a slippery slope. Testing leads to tests, which leads to more tests, which leads to complete test coverage, and look! We've just duplicated https://github.com/python/cpython/tree/master/Lib/test.

If we go down this road, we would need very clear wording that unit tests for pyi are reserved for special circumstances and not mandatory. That's a message that's really hard to convey, in a language culture as addicted to testing as Python.

gvanrossum · 2017-05-25T03:26:31Z

I agree with what Matthias said (and it's why previously I was against such tests). But there are only a few committers on the project and together we can set the bar high for when tests are accepted in PRs. It will help if there are clear words in README.md and CONTRIBUTING.md about this (reviewers can then refer contributors to those words).

vlasovskikh · 2017-05-30T22:19:41Z

I like the general idea of having tests for stubs. So having them at least for tricky signatures looks like a reasonable compromise to me.

hauntsaninja · 2020-12-03T03:40:07Z

An easy and hopefully unobjectionable way to do this is to create a repo with whatever code you want to test and add the repo to mypy_primer.

JukkaL · 2021-02-10T15:59:39Z

I think that now that we have the new directory structure, we can reopen this discussion.

One way to do this would be to suport an optional @test directory under each stub directory. The @test directory could contain any number of .py files. The simplest approach would to be type check these files and fail the test if there are any errors. We can also support specifying that certain lines must generate errors, for example by using a magic comment (for example, # ERROR).

Here's an example of how this could work. To add tests for foobar, we can add the file stubs/foobar/@test/test.py:

from foobar import complex_function

complex_function(1, arg="x")  # On this line there must not be an error

complex_function(1, arg=["x"])  # ERROR  # On this line there must be an error

We'd run the tests in Python 2 and/or Python 3 modes, depending on what's supported by the stubs. It should be possible to use Python version checks to have different tests for Python 2 and Python 3.

In addition to optionally having tests for tricky signatures, I believe that there are a few other valid use cases for tests:

For each package, we can have a small number of tests that verify some of the most common use cases. These can act as sanity checks, and they can protect against accidentally making the stubs completely unusable (after major refactorings or after changes to semantics of stubs, for example).
Any PR that fixes some signature can add a test to validate the fix. This can prevent regressions and make it easier to review the fix. It can also make the contributor more confident that their fix works as expected.
Non-trivial re-exported definitions can be useful to test, since these are easy to break.

As discussed above, we don't want to have exhaustive tests that test every single signature in stubs. Also PRs that add dozens of annotations probably shouldn't add tests for all of the new signatures (but it may be fine to test a few of the signatures). We'd need to make the conventions explicit and enforce them in code reviews. I believe that after we've had the conventions established for some time, contributors will figure this out by looking at existing tests as examples. It will be important that we don't let poor-quality or too wide tests through code review, as it's possible (even likely) that contributors will start emulating these examples.

hauntsaninja · 2021-02-10T17:58:34Z

Now that we have a package version in METADATA.toml, we could also run stubtest. This would totally take care of 1.

srittau · 2021-04-09T10:52:13Z

Both @JukkaL's and @hauntsaninja's suggestions sound good to me.

AlexWaygood · 2021-11-19T15:25:03Z

In December 2020, @hauntsaninja wrote:

An easy and hopefully unobjectionable way to do this is to create a repo with whatever code you want to test and add the repo to mypy_primer.

I would very much like to do something like this to add some regression tests for pow, which is exceedingly difficult to type. However, as @JelleZijlstra noted, it would probably be better for these mypy_primer-based regression tests to be put inside typeshed rather than maintained as an external repo.

Some questions, therefore:

Do we agree that regression tests for the thorniest cases, such as pow, would help typeshed? I agree with the consensus that regression tests on the level of the cpython repo are wholly unnecessary, but it seems to me that the corner cases of pow just don't appear often enough in open-source code for mypy_primer/stubtest to be sufficient with this function, wonderful tools though they are.
Would something along these lines be an acceptable starting point for some mypy_primer-based regression tests for pow?
Do we agree that it would be better to have these tests inside typeshed, rather than in a different respository?

If the answer to all of these is "yes", I would be happy to submit a PR, leading to...

Where should these tests go in the typeshed repo? What should the folder be called? Etc., etc.

JelleZijlstra · 2021-11-19T15:29:08Z

Thank you! I agree on "yes" for these questions.

I don't feel strongly about the structure or naming, but here's a proposal:

Create a new top-level directory test_cases/
Within this directory, the directory structure mirrors the top level
Every stub file is tested in a file prefixed with test_
So, for example, the pow() tests would live in test_cases/stdlib/test_builtins.py.

AlexWaygood · 2021-11-19T15:40:18Z

One complication is that a regression tests for pow (for example) won't work unless we have the absolute strictest mypy settings applied. That includes --disallow-any-expr. For some builtins tests, however, that might not be practical -- we might want the revealed type to be Any, or it might be impossible in some cases to write a test without a type being inferred by mypy as Any. I can think of two solutions to this problem:

Have two sets of tests (in two separate directories?), with different levels of strictness applied.
Have one set of tests, and apply the strictest mypy settings to all tests. If we want an inferred type to be Any, we add a # type: ignore comment accordingly. Then, if we see any mypy_primer output telling us about an unused # type: ignore comment in the tests directory, we "know" internally that this signifies a potential failing test case.

Option (2) probably won't work until #6307 is fixed, however.

Akuli · 2021-11-19T17:11:25Z

Another problem is how to ensure the type matches. This doesn't really work:

def expects_float(arg: float) -> None: ...
expects_float(pow(5, -7))

If pow(5, -7) returns int, it would implicitly convert to float and cause no errors. This doesn't work either, because a list literal like [2] can be list[float] or list[int], depending on how it is used:

def expects_float(arg: list[float]) -> None: ...
expects_float([pow(5, -7)])

Maybe we should use mypy's test suite instead? It makes it possible to assert the output of reveal_type().

AlexWaygood · 2021-11-19T17:50:47Z

This doesn't really work:

I would say it "partially works", since we are at least verifying that the inferred type is not Any or complex. But yes, I agree that this is far from an ideal way of testing these things. Being able to assert the output of reveal_type sounds much more promising, though I don't really know anything about how mypy's test suite works.

JelleZijlstra · 2021-11-19T17:59:36Z

though I don't really know anything about how mypy's test suite works

It mostly works like this:

reveal_type(1)  # E: Revealed type is 'int'

And the framework complains if the revealed type is something different. I don't think it's easy to use that syntax outside of mypy itself, though there are apparently a few tools like pytest plugins doing that.

You can work around the float/int thing by taking advantage of list invariance. This errors for mypy:

from typing import List

def f(x: List[float]):
    pass

y = [3]
f(y)

AlexWaygood · 2021-11-19T22:52:14Z

I think we shouldn't let the perfect be the enemy of the good. If nobody objects, I'll go ahead and draft a PR with some mypy_primer-based regression tests.

If somebody else drafts a PR proposing superior tests based on mypy's testing framework, I'll happily abandon my PR (/revert my PR if it's already merged) 🙂

You can work around the float/int thing by taking advantage of list invariance.

Very nice!

sobolevn · 2021-11-20T08:49:28Z

If somebody else drafts a PR proposing superior tests based on mypy's testing framework

I cannot refrain from a shameless plug: https://github.com/typeddjango/pytest-mypy-plugins
It is way more advanced. It supports parametrization, regex matchers (to problems with numeric typevar ids), etc.

AlexWaygood · 2021-11-21T11:57:00Z

I cannot refrain from a shameless plug: https://github.com/typeddjango/pytest-mypy-plugins

This looks great! If others also agree that this would be a better way to go, I'll hold off making a PR and let @sobolevn propose a PR using this framework.

sobolevn · 2021-11-21T13:17:53Z

I'll hold off making a PR and let @sobolevn propose a PR using this framework.

I was not really paying attentions to this thread 😆
So, I just saw a keyword combination: tests + types. And here I am!

JelleZijlstra · 2022-04-26T02:35:35Z

We now have such tests: #7663.

gvanrossum mentioned this issue May 26, 2017

fix types in os.path #1363

Merged

shoyer mentioned this issue Dec 6, 2017

Continuous integration tests numpy/numpy-stubs#3

Closed

srittau added the project: policy Organization of the typeshed project label Oct 21, 2018

hauntsaninja mentioned this issue Aug 30, 2020

Better testing of stubs #754

Closed

hauntsaninja mentioned this issue Sep 26, 2020

[FYI] mypy_primer #4580

Closed

hauntsaninja mentioned this issue Jan 3, 2021

For itertools.accumulate, erroneously: requires func if initial specified; allows initial to be positional #4888

Closed

Akuli mentioned this issue Apr 24, 2021

Add stubs for docopt package #5248

Merged

hauntsaninja mentioned this issue Apr 28, 2021

Support concatenating lists with different contents #5260

Closed

Akuli mentioned this issue May 21, 2021

change getattr stubs to better type the default argument and return type #5516

Merged

JelleZijlstra mentioned this issue Nov 17, 2021

Further improve pow #6325

Merged

srittau removed the project: policy Organization of the typeshed project label Jan 17, 2022

srittau added the project: infrastructure typeshed build, test, documentation, or distribution related label Jan 17, 2022

AlexWaygood mentioned this issue Apr 19, 2022

Add regression tests for builtins.pow and object.__reduce__ #7663

Merged

JelleZijlstra closed this as completed Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests specifically for tricky, white-lie signatures #1339

Tests specifically for tricky, white-lie signatures #1339

gnprice commented May 24, 2017

gnprice commented May 24, 2017

JelleZijlstra commented May 24, 2017

matthiaskramm commented May 25, 2017

gvanrossum commented May 25, 2017

vlasovskikh commented May 30, 2017

hauntsaninja commented Dec 3, 2020

JukkaL commented Feb 10, 2021

hauntsaninja commented Feb 10, 2021

srittau commented Apr 9, 2021

AlexWaygood commented Nov 19, 2021

JelleZijlstra commented Nov 19, 2021 •

edited

AlexWaygood commented Nov 19, 2021

Akuli commented Nov 19, 2021

AlexWaygood commented Nov 19, 2021

JelleZijlstra commented Nov 19, 2021

AlexWaygood commented Nov 19, 2021

sobolevn commented Nov 20, 2021

AlexWaygood commented Nov 21, 2021

sobolevn commented Nov 21, 2021

JelleZijlstra commented Apr 26, 2022

Tests specifically for tricky, white-lie signatures #1339

Tests specifically for tricky, white-lie signatures #1339

Comments

gnprice commented May 24, 2017

gnprice commented May 24, 2017

JelleZijlstra commented May 24, 2017

matthiaskramm commented May 25, 2017

gvanrossum commented May 25, 2017

vlasovskikh commented May 30, 2017

hauntsaninja commented Dec 3, 2020

JukkaL commented Feb 10, 2021

hauntsaninja commented Feb 10, 2021

srittau commented Apr 9, 2021

AlexWaygood commented Nov 19, 2021

JelleZijlstra commented Nov 19, 2021 • edited

AlexWaygood commented Nov 19, 2021

Akuli commented Nov 19, 2021

AlexWaygood commented Nov 19, 2021

JelleZijlstra commented Nov 19, 2021

AlexWaygood commented Nov 19, 2021

sobolevn commented Nov 20, 2021

AlexWaygood commented Nov 21, 2021

sobolevn commented Nov 21, 2021

JelleZijlstra commented Apr 26, 2022

JelleZijlstra commented Nov 19, 2021 •

edited