Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor assertions in test_creation.py #32

Merged
merged 29 commits into from Oct 29, 2021

Conversation

honno
Copy link
Member

@honno honno commented Oct 22, 2021

In #30 I'm experimenting with the ph.assert_dtype() assertion util. As well as those promoted dtype checks, I realise a lot of dtype assertions relate to the default int and float, so I wanted to experiment with refactoring those too... I didn't get round to that today, seeing as how tricky just understanding test_arange was (I think it covers more now at least).

@honno honno force-pushed the creation-refactor branch 2 times, most recently from 3992a6e to 1b30535 Compare October 27, 2021 11:08
@honno honno marked this pull request as ready for review October 27, 2021 17:33
@honno
Copy link
Member Author

honno commented Oct 27, 2021

I unexpectedly spent a lot of time extending/modifying namely the arange and linspace tests. On that front I believe they should cover more things, and run much faster as I've managed to avoid filtering/throwing Hypothesis examples. Especially interested if you have thoughts on the specified_kwargs() strategy I wrote in the spirit of #22 . A rough array equals is something I could get round to as well.

In terms of refactoring assertions, I've actually found the process useful in making sure to test all the appropriate things in each test method. It also saves time in creating user-friendly error messages, especially if you want to fix/modify the message. I was thinking of moving these into say pytest_helpers.py and using them in test_elementwise.py, as I'm certain I missed things in those methods—and generally I'll use them going forward.

Tomorrow I'll have to fix a bug I just caught (see comment), but otherwise I'm happy for a review.

@asmeurer
Copy link
Member

Is the point of specified_kwargs that you can't use kwargs because the tests use data?

A rough array equals is something I could get round to as well.

I expect this to be pretty thorny. The spec doesn't require any level of precision for output values, and different libraries may return different results if they use different implementations. There have been some ideas (e.g., #7), but none of it is simple. I'd personally rather see at least basic coverage of all the functions before we start to worry about testing floating-point correctness. Right now there are still too many functions that aren't tested at all, or are only tested to a bare minimum in test_signatures.

@asmeurer
Copy link
Member

Also, you probably figured this out already, but we don't need to generalize the argument handling of arange too much. arange's signature is quite unique in the specification. It's the only function with a signature as complicated as it is.

@honno
Copy link
Member Author

honno commented Oct 28, 2021

Is the point of specified_kwargs that you can't use kwargs because the tests use data?
Also, you probably figured this out already, but we don't need to generalize the argument handling of arange too much. arange's signature is quite unique in the specification. It's the only function with a signature as complicated as it is.

Yep, although specified_kwargs() was helpful for test_linspace too. Fairly simple implementation, and I imagine it might come up again when test method signatures cannot realistically mimic the corresponding method signatures.

@honno honno requested a review from asmeurer October 28, 2021 10:31
@honno
Copy link
Member Author

honno commented Oct 28, 2021

Also meshgrid sometimes kills pytest runs, like the test suite has now. Found this tricky to debug and thought it was just an error in my implementation (like when I was extending test_arange), but it's come up again so clearly not. Can explore NumPy's issue tracker tomorrow, otherwise try and identify the issue.

@asmeurer
Copy link
Member

Looking at the code for test_meshgrid, my guess is that you are not limiting the number of input arrays, so the result is huge. This causes NumPy to try to allocate an array that doesn't fit in memory and it crashes Python. Note that the result shape from meshgrid is the product of the input shapes. You should filter any input where the resulting output would be larger than MAX_ARRAY_SIZE. It's also reasonable to not send more than some smallish number of total inputs. This is another example where the input strategy is complicated, so we want to just test the promotion in the normal test. If anything, I'm surprised this doesn't crash every time.

Also meshgrid should only really be supporting 1-d inputs. That looks like a bug in the numpy implementation.

("dtype", dtype, None),
),
label="kw",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing a reason why this wouldn't work, but can all the drawing stuff in this test be factored into a @composite test strategy. The logic here is complicated it is probably a good idea to have some meta tests for it to make sure it doesn't accidentally omit things it shouldn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now moved specified_kwargs() to hypothesis_helpers.py and wrote a test method for it. I also introduced a named tuple which I think helps to clarify what the arguments mean.

assert list(a) == list(r), "arange() produced incorrect values"
assert out.dtype == dtype
assert out.ndim == 1, f"{out.ndim=}, but should be 1 [linspace()]"
f_func = f"[linspace({start=}, {stop=}, {step=})]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f_func = f"[linspace({start=}, {stop=}, {step=})]"
f_func = f"[arange({start=}, {stop=}, {step=})]"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be confusing (or wrong really) if only a single argument is passed because arange(n) is the "start" argument (the first argument) but n is actually the stop. In fact, that's why start is positional-only, so that arange(start=n) is impossible. In general I would writing x=val in error messages if x is a positional-only argument because that makes it not copy-pastable and potentially confusing if a library doesn't use the same positional-only name as the spec.

size <= hh.MAX_ARRAY_SIZE
), f"{size=} should be no more than {hh.MAX_ARRAY_SIZE}" # sanity check

kw = data.draw(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not test stop and step as keyword-arguments. arange is a little special. We really want it to be a function with three signatures

arange(stop, /, dtype=None)
arange(start, stop, /, dtype=None)
arange(start, stop, step, /, dtype=None)

(c.f. help(range)) but there's no way to represent that as a single signature in Python. The closest we can do is to make two of the arguments optional by making them keyword arguments. The complication here is first that if only one argument is passed, it is actually the stop (even though it is still called start, and second, things like arange(n, step=k) and arange(n, stop=None) are meaningless. IMO we should make this clearer in the spec, but really arange only ought to support start, stop, and step as positional, i.e., as if it were the above 3 signatures. This was discussed some in data-apis/array-api#107 and data-apis/array-api#85.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. So now I test the following argument combinations:

  • (start,) (only if stop is None)
  • (start, stop)
  • (start, stop, step)

Note step must be used as a keyword argument when the argument combination isn't (start, stop, step), which test_arange indeed passes it as (unless sometimes when step == 1).

@honno
Copy link
Member Author

honno commented Oct 29, 2021

Looking at the code for test_meshgrid, my guess is that you are not limiting the number of input arrays, so the result is huge. This causes NumPy to try to allocate an array that doesn't fit in memory and it crashes Python. Note that the result shape from meshgrid is the product of the input shapes. You should filter any input where the resulting output would be larger than MAX_ARRAY_SIZE. It's also reasonable to not send more than some smallish number of total inputs. This is another example where the input strategy is complicated, so we want to just test the promotion in the normal test. If anything, I'm surprised this doesn't crash every time.

Also meshgrid should only really be supporting 1-d inputs. That looks like a bug in the numpy implementation.

Fixed every issue you identified, thanks! I'm not familiar enough with meshgrid() and memory things generally to dynamically limit the inputs, so I've just hardcoded them in, ii.e. no more than 5 arrays and array sizes cannot exceed 5. Not ideal for a "fully-fledged" meshgrid() test, but probably covers enough in regards to testing type promotion.

@honno honno requested a review from asmeurer October 29, 2021 14:51
@honno honno mentioned this pull request Oct 29, 2021
@asmeurer asmeurer merged commit 035e3f3 into data-apis:master Oct 29, 2021
@asmeurer
Copy link
Member

I've merged this. We might still need to do some work on the arange test, but for now I think it's OK. The meshgrid change is fine. We can expand it when we convert it into the actual meshgrid test.

@honno honno deleted the creation-refactor branch February 8, 2022 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants