New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: adjust other very slow tests #20487
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Matt.
I think to make this work, we should have one CI job running xslow
tests. And we should add it as a proper test mode (python dev.py test -m xslow
). Right now xslow
tests are effectively never run (maybe by a dev a few times per release cycle).
A few of these seem fine to move into xslow
, but you're also effectively removing test coverage for all our Cython APIs and for import cycles here (putting the former in 'fast'
was a very recent decision in gh-20422) - that cannot be right.
The criterion here is not only "how long does a single test take" but more "is this test worth this much time in every fast/full run".
@@ -28,7 +28,7 @@ def run(self): | |||
|
|||
return WorkerThread() | |||
|
|||
@pytest.mark.slow | |||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems fine to mark as xslow
indeed. Kinda pointless to run really slow tests that are marked as xfail
.
Worth trying, but I wouldn't be surprised if that would require e.g. an xxslow marker to exclude tesrs with very extravagant requirements (memory, cpu etc). Also, this job should probably only run xslow tests, not everything including xslow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgommers I adjusted the ones you mentioned and the test_cython
tests in linalg
and optimize
. The rest look ok to you? Any I should check with someone else about?
[skip cirrus] [skip circle]
[skip cirrus] [skip circle]
[skip cirrus] [skip circle]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the requested changes. Included some comments here to justify why it's probably ok to run the tests less often.
@@ -342,6 +342,7 @@ def simpfunc(z, y, x, t): # Note order of arguments. | |||
(2.,)), | |||
2*8/3.0 * (b**4.0 - a**4.0)) | |||
|
|||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are plenty of tests for 2D improper integrals, and there is other coverage of 3D integrals. Both of these functions just call nquad
, so I think it's safe enough to run these only selectively, like when work is done on the functions (rare).
@@ -476,7 +476,7 @@ def f(x, y): | |||
]) | |||
def test_descending_points_nd(self, method, ndims, func): | |||
|
|||
if ndims == 5 and method in {"cubic", "quintic"}: | |||
if ndims >= 4 and method in {"cubic", "quintic"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plenty of other coverage.
@@ -1339,7 +1339,7 @@ def c1(x): | |||
assert_(np.all(res.x >= np.array(bounds)[:, 0])) | |||
assert_(np.all(res.x <= np.array(bounds)[:, 1])) | |||
|
|||
@pytest.mark.slow | |||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just one of many tests. This one happens to be very slow, and it fails on some platforms, so probably no harm in making it xslow.
# max iter | ||
if result.status in (0, 3): | ||
raise RuntimeError("Invalid termination condition.") | ||
class TestTrustRegionConstr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply converted the for
loops to parametrize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to merge this, but in the future I'd avoid this since it's a very large diff and doesn't actually change the total runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's much smaller and easier to read with "Hide whitespace changes" (mostly indentation changes), and it allowed me to see that no tests are particularly slow. Most are under 0.05s; there are just a lot of them. If we want to speed things up, how about flipping a coin like in the lobpcg
tests?
pytest.skip("Numerical Hessian needs analytical gradient") | ||
if prob.grad is True and grad in {'3-point', False}: | ||
pytest.skip("prob.grad incompatible with grad in {'3-point', False}") | ||
sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started getting a failure in just one case on just the Accelerate platform. I thought it was just a matter of the guess x0
changing, but that doesn't seem to be the case. Not sure what's happening, but it's not too concerning, so I'll just open an issue when this merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI was skipped on the last push, so I'm not sure if this will start to fail if we merge now. That would not be helpful - can you check it's fixed, and if not either drop this whole change or try to resolve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the diff skips the problematic test case so that CI would not fail anymore. The proposal is to merge the PR with the problematic test skipped, then I'll open a separate issue about the failure.
It doesn't seem like something that should hold up this PR - the failure appears to be caused by a minor change in how the test is configured, and it seems to be exposing an issue that has always been here; we might have just been lucky/unlucky not to see before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good to me.
@@ -89,6 +89,7 @@ def test_svdp(ctor, dtype, irl, which): | |||
check_svdp(n, m, ctor, dtype, k, irl, which) | |||
|
|||
|
|||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wrote this test, and I don't think there's anything very special about it; it's just one that happened to be distributed with PROPACK. We have other tests that don't take as long.
scipy/spatial/tests/test_qhull.py
Outdated
@@ -344,7 +344,7 @@ def test_degenerate_barycentric_transforms(self): | |||
# Check the transforms | |||
self._check_barycentric_transforms(tri) | |||
|
|||
@pytest.mark.slow | |||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inclusion of "more" in the name suggests that it's not essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slow
seems like the right call to me instead of xslow
here. Checking locally:
python dev.py test -m full -t scipy/spatial/tests/test_qhull.py::TestUtilities::test_more_barycentric_transforms -- --durations=2
0.67s call build-install/lib/python3.11/site-packages/scipy/spatial/tests/test_qhull.py::TestUtilities::test_more_barycentric_transforms
That seems reasonable, under a second, and only when full
is requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Note that the criterion for inclusion here was that these tests took >5s
on my machine.
@@ -533,7 +533,7 @@ def test_maxit(): | |||
assert_allclose(np.shape(r_h), np.shape(np.asarray(r_h))) | |||
|
|||
|
|||
@pytest.mark.slow | |||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to @parametrize rather than looping and reduce the number of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also just reduce the number of cases if that helps.
The current selection (after applying the proposed changes for |
[skip ci]
[skip ci]
Let's revert the |
[skip ci]
Reverted the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt. Almost there, but CI shouldn't start failing if we merge this, so I'd like to double check on the TestTrustRegionConstr
thing.
# max iter | ||
if result.status in (0, 3): | ||
raise RuntimeError("Invalid termination condition.") | ||
class TestTrustRegionConstr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to merge this, but in the future I'd avoid this since it's a very large diff and doesn't actually change the total runtime.
pytest.skip("Numerical Hessian needs analytical gradient") | ||
if prob.grad is True and grad in {'3-point', False}: | ||
pytest.skip("prob.grad incompatible with grad in {'3-point', False}") | ||
sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI was skipped on the last push, so I'm not sure if this will start to fail if we merge now. That would not be helpful - can you check it's fixed, and if not either drop this whole change or try to resolve it?
@@ -533,7 +533,7 @@ def test_maxit(): | |||
assert_allclose(np.shape(r_h), np.shape(np.asarray(r_h))) | |||
|
|||
|
|||
@pytest.mark.slow | |||
@pytest.mark.xslow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also just reduce the number of cases if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded to comments; all tests pass.
# max iter | ||
if result.status in (0, 3): | ||
raise RuntimeError("Invalid termination condition.") | ||
class TestTrustRegionConstr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's much smaller and easier to read with "Hide whitespace changes" (mostly indentation changes), and it allowed me to see that no tests are particularly slow. Most are under 0.05s; there are just a lot of them. If we want to speed things up, how about flipping a coin like in the lobpcg
tests?
# list_sparse_format = ['bsr', 'coo', 'csc', 'csr', 'dia', 'dok', 'lil'] | ||
list_sparse_format = ['coo'] | ||
sparse_formats = len(list_sparse_format) | ||
list_sparse_format = ['bsr', 'coo', 'csc', 'csr', 'dia', 'dok', 'lil'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the strategy for skipping tests (flip a coin), I figured I might as well bring these back in.
# This is one of the slower tests because there are >1,000 configs | ||
# to test here, instead of checking product of all input, output types | ||
# test each configuration for the first sparse format, and then | ||
# for one additional sparse format. this takes 2/7=30% as long as | ||
# testing all configurations for all sparse formats. | ||
if s_f_i > 0: | ||
tests = tests[s_f_i - 1::sparse_formats-1] | ||
|
||
for A, B, M, X, Y in tests: | ||
# This is one of the slower tests because there are >1,000 configs | ||
# to test here. Flip a biased coin to decide whether to run each | ||
# test to get decent coverage in less time. | ||
if rnd.random() < 0.95: | ||
continue # too many tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which tests are appropriate to skip, so I decided to flip a coin. Before, only the tests in a small corner of the space of possible tests were run; now they are distributed throughout. So hopefully the coverage is a bit stronger than before, and the time is cut down by ~60%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks a hard requirement that we have: tests should be reproducible. So this change cannot be right. Why not mark some subset as slow (or even xslow). Or, if achieving good test coverage in a reasonable amount of time really is that problematic, then this may be a fit for hypothesis
. We default to fixed seeds, but it's possible with hypothesis
to randomly sample the subspace in a more controlled way (plus if it fails, it tells you the reproducer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a fixed seed. The point is that instead of exhaustively checking a small corner of the space (e.g. with one type of matrix), it is a (deterministic) uniform sample of the space.
If the choice of the name rnd
is confusing, I had changed it to the more familiar rng
, but then changed it back because it is named that throughout the file. I can change all 20 occurrences if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry about the noise then. And yes indeed, you guessed right - I read this wrong because of rnd
(and maybe "flip a coin" suggested randomness). But it's fine to leave as is I'd say.
pytest.skip("Numerical Hessian needs analytical gradient") | ||
if prob.grad is True and grad in {'3-point', False}: | ||
pytest.skip("prob.grad incompatible with grad in {'3-point', False}") | ||
sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the diff skips the problematic test case so that CI would not fail anymore. The proposal is to merge the PR with the problematic test skipped, then I'll open a separate issue about the failure.
It doesn't seem like something that should hold up this PR - the failure appears to be caused by a minor change in how the test is configured, and it seems to be exposing an issue that has always been here; we might have just been lucky/unlucky not to see before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's give this a go, it's been reviewed in enough detail. Thanks Matt!
Reference issue
Followup to gh-20468
What does this implement/fix?
This adjust some very slow (>5s) tests, either marking them
xslow
or skipping parts of them.Additional information
Code-owners, LMK if the adustment in your code looks OK.