Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: adjust other very slow tests #20487

Merged
merged 10 commits into from Apr 30, 2024
Merged

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Apr 16, 2024

Reference issue

Followup to gh-20468

What does this implement/fix?

This adjust some very slow (>5s) tests, either marking them xslow or skipping parts of them.

Additional information

Code-owners, LMK if the adustment in your code looks OK.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Matt.

I think to make this work, we should have one CI job running xslow tests. And we should add it as a proper test mode (python dev.py test -m xslow). Right now xslow tests are effectively never run (maybe by a dev a few times per release cycle).

A few of these seem fine to move into xslow, but you're also effectively removing test coverage for all our Cython APIs and for import cycles here (putting the former in 'fast' was a very recent decision in gh-20422) - that cannot be right.

The criterion here is not only "how long does a single test take" but more "is this test worth this much time in every fast/full run".

scipy/special/tests/test_extending.py Outdated Show resolved Hide resolved
@@ -28,7 +28,7 @@ def run(self):

return WorkerThread()

@pytest.mark.slow
@pytest.mark.xslow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems fine to mark as xslow indeed. Kinda pointless to run really slow tests that are marked as xfail.

scipy/_lib/tests/test_import_cycles.py Outdated Show resolved Hide resolved
@ev-br
Copy link
Member

ev-br commented Apr 16, 2024

we should have one CI job running xslow tests.

Worth trying, but I wouldn't be surprised if that would require e.g. an xxslow marker to exclude tesrs with very extravagant requirements (memory, cpu etc).

Also, this job should probably only run xslow tests, not everything including xslow.

Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgommers I adjusted the ones you mentioned and the test_cython tests in linalg and optimize. The rest look ok to you? Any I should check with someone else about?

scipy/_lib/tests/test_import_cycles.py Outdated Show resolved Hide resolved
scipy/optimize/tests/test_extending.py Outdated Show resolved Hide resolved
scipy/linalg/tests/test_extending.py Outdated Show resolved Hide resolved
scipy/special/tests/test_extending.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the requested changes. Included some comments here to justify why it's probably ok to run the tests less often.

@@ -342,6 +342,7 @@ def simpfunc(z, y, x, t): # Note order of arguments.
(2.,)),
2*8/3.0 * (b**4.0 - a**4.0))

@pytest.mark.xslow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of tests for 2D improper integrals, and there is other coverage of 3D integrals. Both of these functions just call nquad, so I think it's safe enough to run these only selectively, like when work is done on the functions (rare).

@@ -476,7 +476,7 @@ def f(x, y):
])
def test_descending_points_nd(self, method, ndims, func):

if ndims == 5 and method in {"cubic", "quintic"}:
if ndims >= 4 and method in {"cubic", "quintic"}:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plenty of other coverage.

@@ -1339,7 +1339,7 @@ def c1(x):
assert_(np.all(res.x >= np.array(bounds)[:, 0]))
assert_(np.all(res.x <= np.array(bounds)[:, 1]))

@pytest.mark.slow
@pytest.mark.xslow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just one of many tests. This one happens to be very slow, and it fails on some platforms, so probably no harm in making it xslow.

scipy/optimize/tests/test_extending.py Outdated Show resolved Hide resolved
scipy/optimize/tests/test_minimize_constrained.py Outdated Show resolved Hide resolved
# max iter
if result.status in (0, 3):
raise RuntimeError("Invalid termination condition.")
class TestTrustRegionConstr:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply converted the for loops to parametrize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to merge this, but in the future I'd avoid this since it's a very large diff and doesn't actually change the total runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much smaller and easier to read with "Hide whitespace changes" (mostly indentation changes), and it allowed me to see that no tests are particularly slow. Most are under 0.05s; there are just a lot of them. If we want to speed things up, how about flipping a coin like in the lobpcg tests?

pytest.skip("Numerical Hessian needs analytical gradient")
if prob.grad is True and grad in {'3-point', False}:
pytest.skip("prob.grad incompatible with grad in {'3-point', False}")
sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started getting a failure in just one case on just the Accelerate platform. I thought it was just a matter of the guess x0 changing, but that doesn't seem to be the case. Not sure what's happening, but it's not too concerning, so I'll just open an issue when this merges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI was skipped on the last push, so I'm not sure if this will start to fail if we merge now. That would not be helpful - can you check it's fixed, and if not either drop this whole change or try to resolve it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the diff skips the problematic test case so that CI would not fail anymore. The proposal is to merge the PR with the problematic test skipped, then I'll open a separate issue about the failure.
It doesn't seem like something that should hold up this PR - the failure appears to be caused by a minor change in how the test is configured, and it seems to be exposing an issue that has always been here; we might have just been lucky/unlucky not to see before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good to me.

@@ -89,6 +89,7 @@ def test_svdp(ctor, dtype, irl, which):
check_svdp(n, m, ctor, dtype, k, irl, which)


@pytest.mark.xslow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I wrote this test, and I don't think there's anything very special about it; it's just one that happened to be distributed with PROPACK. We have other tests that don't take as long.

@@ -344,7 +344,7 @@ def test_degenerate_barycentric_transforms(self):
# Check the transforms
self._check_barycentric_transforms(tri)

@pytest.mark.slow
@pytest.mark.xslow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of "more" in the name suggests that it's not essential.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slow seems like the right call to me instead of xslow here. Checking locally:

python dev.py test -m full -t scipy/spatial/tests/test_qhull.py::TestUtilities::test_more_barycentric_transforms -- --durations=2

0.67s call build-install/lib/python3.11/site-packages/scipy/spatial/tests/test_qhull.py::TestUtilities::test_more_barycentric_transforms

That seems reasonable, under a second, and only when full is requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Note that the criterion for inclusion here was that these tests took >5s on my machine.

@@ -533,7 +533,7 @@ def test_maxit():
assert_allclose(np.shape(r_h), np.shape(np.asarray(r_h)))


@pytest.mark.slow
@pytest.mark.xslow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to @parametrize rather than looping and reduce the number of cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also just reduce the number of cases if that helps.

@rgommers
Copy link
Member

The current selection (after applying the proposed changes for test_extending.py) seem fine to me.

@tylerjereddy tylerjereddy added this to the 1.14.0 milestone Apr 19, 2024
@tylerjereddy
Copy link
Contributor

Let's revert the spatial one before merging IMO, as noted above.

@mdhaber
Copy link
Contributor Author

mdhaber commented Apr 21, 2024

Reverted the spatial one, and I'll give it extra time when we fail_slow (gh-20480).

@mdhaber mdhaber requested a review from rgommers April 23, 2024 17:32
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt. Almost there, but CI shouldn't start failing if we merge this, so I'd like to double check on the TestTrustRegionConstr thing.

# max iter
if result.status in (0, 3):
raise RuntimeError("Invalid termination condition.")
class TestTrustRegionConstr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to merge this, but in the future I'd avoid this since it's a very large diff and doesn't actually change the total runtime.

pytest.skip("Numerical Hessian needs analytical gradient")
if prob.grad is True and grad in {'3-point', False}:
pytest.skip("prob.grad incompatible with grad in {'3-point', False}")
sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI was skipped on the last push, so I'm not sure if this will start to fail if we merge now. That would not be helpful - can you check it's fixed, and if not either drop this whole change or try to resolve it?

@@ -533,7 +533,7 @@ def test_maxit():
assert_allclose(np.shape(r_h), np.shape(np.asarray(r_h)))


@pytest.mark.slow
@pytest.mark.xslow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also just reduce the number of cases if that helps.

Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to comments; all tests pass.

# max iter
if result.status in (0, 3):
raise RuntimeError("Invalid termination condition.")
class TestTrustRegionConstr:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much smaller and easier to read with "Hide whitespace changes" (mostly indentation changes), and it allowed me to see that no tests are particularly slow. Most are under 0.05s; there are just a lot of them. If we want to speed things up, how about flipping a coin like in the lobpcg tests?

# list_sparse_format = ['bsr', 'coo', 'csc', 'csr', 'dia', 'dok', 'lil']
list_sparse_format = ['coo']
sparse_formats = len(list_sparse_format)
list_sparse_format = ['bsr', 'coo', 'csc', 'csr', 'dia', 'dok', 'lil']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the strategy for skipping tests (flip a coin), I figured I might as well bring these back in.

Comment on lines -632 to +636
# This is one of the slower tests because there are >1,000 configs
# to test here, instead of checking product of all input, output types
# test each configuration for the first sparse format, and then
# for one additional sparse format. this takes 2/7=30% as long as
# testing all configurations for all sparse formats.
if s_f_i > 0:
tests = tests[s_f_i - 1::sparse_formats-1]

for A, B, M, X, Y in tests:
# This is one of the slower tests because there are >1,000 configs
# to test here. Flip a biased coin to decide whether to run each
# test to get decent coverage in less time.
if rnd.random() < 0.95:
continue # too many tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which tests are appropriate to skip, so I decided to flip a coin. Before, only the tests in a small corner of the space of possible tests were run; now they are distributed throughout. So hopefully the coverage is a bit stronger than before, and the time is cut down by ~60%.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks a hard requirement that we have: tests should be reproducible. So this change cannot be right. Why not mark some subset as slow (or even xslow). Or, if achieving good test coverage in a reasonable amount of time really is that problematic, then this may be a fit for hypothesis. We default to fixed seeds, but it's possible with hypothesis to randomly sample the subspace in a more controlled way (plus if it fails, it tells you the reproducer).

Copy link
Contributor Author

@mdhaber mdhaber Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a fixed seed. The point is that instead of exhaustively checking a small corner of the space (e.g. with one type of matrix), it is a (deterministic) uniform sample of the space.

If the choice of the name rnd is confusing, I had changed it to the more familiar rng, but then changed it back because it is named that throughout the file. I can change all 20 occurrences if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry about the noise then. And yes indeed, you guessed right - I read this wrong because of rnd (and maybe "flip a coin" suggested randomness). But it's fine to leave as is I'd say.

pytest.skip("Numerical Hessian needs analytical gradient")
if prob.grad is True and grad in {'3-point', False}:
pytest.skip("prob.grad incompatible with grad in {'3-point', False}")
sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the diff skips the problematic test case so that CI would not fail anymore. The proposal is to merge the PR with the problematic test skipped, then I'll open a separate issue about the failure.
It doesn't seem like something that should hold up this PR - the failure appears to be caused by a minor change in how the test is configured, and it seems to be exposing an issue that has always been here; we might have just been lucky/unlucky not to see before.

@mdhaber mdhaber requested a review from rgommers April 29, 2024 05:53
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's give this a go, it's been reviewed in enough detail. Thanks Matt!

@rgommers rgommers merged commit 713bce9 into scipy:main Apr 30, 2024
30 checks passed
@mdhaber mdhaber mentioned this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants