TST: adjust other very slow tests #20487

mdhaber · 2024-04-16T05:17:29Z

Reference issue

Followup to gh-20468

What does this implement/fix?

This adjust some very slow (>5s) tests, either marking them xslow or skipping parts of them.

Additional information

Code-owners, LMK if the adustment in your code looks OK.

rgommers

Thanks for working on this Matt.

I think to make this work, we should have one CI job running xslow tests. And we should add it as a proper test mode (python dev.py test -m xslow). Right now xslow tests are effectively never run (maybe by a dev a few times per release cycle).

A few of these seem fine to move into xslow, but you're also effectively removing test coverage for all our Cython APIs and for import cycles here (putting the former in 'fast' was a very recent decision in gh-20422) - that cannot be right.

The criterion here is not only "how long does a single test take" but more "is this test worth this much time in every fast/full run".

scipy/special/tests/test_extending.py

rgommers · 2024-04-16T09:39:09Z

scipy/interpolate/tests/test_gil.py

@@ -28,7 +28,7 @@ def run(self):

        return WorkerThread()

-    @pytest.mark.slow
+    @pytest.mark.xslow


This one seems fine to mark as xslow indeed. Kinda pointless to run really slow tests that are marked as xfail.

scipy/_lib/tests/test_import_cycles.py

ev-br · 2024-04-16T10:06:09Z

we should have one CI job running xslow tests.

Worth trying, but I wouldn't be surprised if that would require e.g. an xxslow marker to exclude tesrs with very extravagant requirements (memory, cpu etc).

Also, this job should probably only run xslow tests, not everything including xslow.

mdhaber

@rgommers I adjusted the ones you mentioned and the test_cython tests in linalg and optimize. The rest look ok to you? Any I should check with someone else about?

scipy/_lib/tests/test_import_cycles.py

scipy/optimize/tests/test_extending.py

scipy/linalg/tests/test_extending.py

scipy/special/tests/test_extending.py

scipy/optimize/tests/test_minimize_constrained.py

[skip cirrus] [skip circle]

mdhaber

Made the requested changes. Included some comments here to justify why it's probably ok to run the tests less often.

mdhaber · 2024-04-19T15:16:23Z

scipy/integrate/tests/test_quadpack.py

@@ -342,6 +342,7 @@ def simpfunc(z, y, x, t):      # Note order of arguments.
                            (2.,)),
                     2*8/3.0 * (b**4.0 - a**4.0))

+    @pytest.mark.xslow


There are plenty of tests for 2D improper integrals, and there is other coverage of 3D integrals. Both of these functions just call nquad, so I think it's safe enough to run these only selectively, like when work is done on the functions (rare).

mdhaber · 2024-04-19T15:17:00Z

scipy/interpolate/tests/test_rgi.py

@@ -476,7 +476,7 @@ def f(x, y):
    ])
    def test_descending_points_nd(self, method, ndims, func):

-        if ndims == 5 and method in {"cubic", "quintic"}:
+        if ndims >= 4 and method in {"cubic", "quintic"}:


Plenty of other coverage.

mdhaber · 2024-04-19T15:18:07Z

scipy/optimize/tests/test__differential_evolution.py

@@ -1339,7 +1339,7 @@ def c1(x):
        assert_(np.all(res.x >= np.array(bounds)[:, 0]))
        assert_(np.all(res.x <= np.array(bounds)[:, 1]))

-    @pytest.mark.slow
+    @pytest.mark.xslow


This is just one of many tests. This one happens to be very slow, and it fails on some platforms, so probably no harm in making it xslow.

scipy/optimize/tests/test_extending.py

scipy/optimize/tests/test_minimize_constrained.py

mdhaber · 2024-04-19T15:20:21Z

scipy/optimize/tests/test_minimize_constrained.py

-                    # max iter
-                    if result.status in (0, 3):
-                        raise RuntimeError("Invalid termination condition.")
+class TestTrustRegionConstr:


Simply converted the for loops to parametrize.

Fine to merge this, but in the future I'd avoid this since it's a very large diff and doesn't actually change the total runtime.

It's much smaller and easier to read with "Hide whitespace changes" (mostly indentation changes), and it allowed me to see that no tests are particularly slow. Most are under 0.05s; there are just a lot of them. If we want to speed things up, how about flipping a coin like in the lobpcg tests?

mdhaber · 2024-04-19T15:23:28Z

scipy/optimize/tests/test_minimize_constrained.py

+            pytest.skip("Numerical Hessian needs analytical gradient")
+        if prob.grad is True and grad in {'3-point', False}:
+            pytest.skip("prob.grad incompatible with grad in {'3-point', False}")
+        sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point'


Started getting a failure in just one case on just the Accelerate platform. I thought it was just a matter of the guess x0 changing, but that doesn't seem to be the case. Not sure what's happening, but it's not too concerning, so I'll just open an issue when this merges.

CI was skipped on the last push, so I'm not sure if this will start to fail if we merge now. That would not be helpful - can you check it's fixed, and if not either drop this whole change or try to resolve it?

This part of the diff skips the problematic test case so that CI would not fail anymore. The proposal is to merge the PR with the problematic test skipped, then I'll open a separate issue about the failure.
It doesn't seem like something that should hold up this PR - the failure appears to be caused by a minor change in how the test is configured, and it seems to be exposing an issue that has always been here; we might have just been lucky/unlucky not to see before.

This sounds good to me.

mdhaber · 2024-04-19T15:27:18Z

scipy/sparse/linalg/tests/test_propack.py

@@ -89,6 +89,7 @@ def test_svdp(ctor, dtype, irl, which):
        check_svdp(n, m, ctor, dtype, k, irl, which)


+@pytest.mark.xslow


I think I wrote this test, and I don't think there's anything very special about it; it's just one that happened to be distributed with PROPACK. We have other tests that don't take as long.

mdhaber · 2024-04-19T15:30:40Z

scipy/spatial/tests/test_qhull.py

@@ -344,7 +344,7 @@ def test_degenerate_barycentric_transforms(self):
        # Check the transforms
        self._check_barycentric_transforms(tri)

-    @pytest.mark.slow
+    @pytest.mark.xslow


The inclusion of "more" in the name suggests that it's not essential.

slow seems like the right call to me instead of xslow here. Checking locally:

python dev.py test -m full -t scipy/spatial/tests/test_qhull.py::TestUtilities::test_more_barycentric_transforms -- --durations=2

0.67s call build-install/lib/python3.11/site-packages/scipy/spatial/tests/test_qhull.py::TestUtilities::test_more_barycentric_transforms

That seems reasonable, under a second, and only when full is requested.

That's fine. Note that the criterion for inclusion here was that these tests took >5s on my machine.

mdhaber · 2024-04-19T15:34:30Z

scipy/sparse/linalg/_eigen/lobpcg/tests/test_lobpcg.py

@@ -533,7 +533,7 @@ def test_maxit():
    assert_allclose(np.shape(r_h), np.shape(np.asarray(r_h)))


-@pytest.mark.slow
+@pytest.mark.xslow


Another option would be to @parametrize rather than looping and reduce the number of cases.

Can also just reduce the number of cases if that helps.

rgommers · 2024-04-19T16:09:28Z

The current selection (after applying the proposed changes for test_extending.py) seem fine to me.

[skip ci]

scipy/optimize/tests/test_extending.py

[skip ci]

tylerjereddy · 2024-04-19T17:18:41Z

Let's revert the spatial one before merging IMO, as noted above.

scipy/spatial/tests/test_qhull.py

[skip ci]

mdhaber · 2024-04-21T16:13:44Z

Reverted the spatial one, and I'll give it extra time when we fail_slow (gh-20480).

rgommers

Thanks Matt. Almost there, but CI shouldn't start failing if we merge this, so I'd like to double check on the TestTrustRegionConstr thing.

rgommers · 2024-04-25T10:34:44Z

scipy/optimize/tests/test_minimize_constrained.py

-                    # max iter
-                    if result.status in (0, 3):
-                        raise RuntimeError("Invalid termination condition.")
+class TestTrustRegionConstr:


Fine to merge this, but in the future I'd avoid this since it's a very large diff and doesn't actually change the total runtime.

rgommers · 2024-04-25T10:36:12Z

scipy/optimize/tests/test_minimize_constrained.py

+            pytest.skip("Numerical Hessian needs analytical gradient")
+        if prob.grad is True and grad in {'3-point', False}:
+            pytest.skip("prob.grad incompatible with grad in {'3-point', False}")
+        sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point'


CI was skipped on the last push, so I'm not sure if this will start to fail if we merge now. That would not be helpful - can you check it's fixed, and if not either drop this whole change or try to resolve it?

rgommers · 2024-04-25T10:36:43Z

scipy/sparse/linalg/_eigen/lobpcg/tests/test_lobpcg.py

@@ -533,7 +533,7 @@ def test_maxit():
    assert_allclose(np.shape(r_h), np.shape(np.asarray(r_h)))


-@pytest.mark.slow
+@pytest.mark.xslow


Can also just reduce the number of cases if that helps.

mdhaber

Responded to comments; all tests pass.

mdhaber · 2024-04-25T16:26:52Z

scipy/optimize/tests/test_minimize_constrained.py

-                    # max iter
-                    if result.status in (0, 3):
-                        raise RuntimeError("Invalid termination condition.")
+class TestTrustRegionConstr:


It's much smaller and easier to read with "Hide whitespace changes" (mostly indentation changes), and it allowed me to see that no tests are particularly slow. Most are under 0.05s; there are just a lot of them. If we want to speed things up, how about flipping a coin like in the lobpcg tests?

mdhaber · 2024-04-25T16:28:52Z

scipy/sparse/linalg/_eigen/lobpcg/tests/test_lobpcg.py

-    # list_sparse_format = ['bsr', 'coo', 'csc', 'csr', 'dia', 'dok', 'lil']
-    list_sparse_format = ['coo']
-    sparse_formats = len(list_sparse_format)
+    list_sparse_format = ['bsr', 'coo', 'csc', 'csr', 'dia', 'dok', 'lil']


Given the strategy for skipping tests (flip a coin), I figured I might as well bring these back in.

mdhaber · 2024-04-25T16:31:07Z

scipy/sparse/linalg/_eigen/lobpcg/tests/test_lobpcg.py

-        # This is one of the slower tests because there are >1,000 configs
-        # to test here, instead of checking product of all input, output types
-        # test each configuration for the first sparse format, and then
-        # for one additional sparse format. this takes 2/7=30% as long as
-        # testing all configurations for all sparse formats.
-        if s_f_i > 0:
-            tests = tests[s_f_i - 1::sparse_formats-1]

        for A, B, M, X, Y in tests:
+            # This is one of the slower tests because there are >1,000 configs
+            # to test here. Flip a biased coin to decide whether to run  each
+            # test to get decent coverage in less time.
+            if rnd.random() < 0.95:
+                continue  # too many tests


I don't know which tests are appropriate to skip, so I decided to flip a coin. Before, only the tests in a small corner of the space of possible tests were run; now they are distributed throughout. So hopefully the coverage is a bit stronger than before, and the time is cut down by ~60%.

This breaks a hard requirement that we have: tests should be reproducible. So this change cannot be right. Why not mark some subset as slow (or even xslow). Or, if achieving good test coverage in a reasonable amount of time really is that problematic, then this may be a fit for hypothesis. We default to fixed seeds, but it's possible with hypothesis to randomly sample the subspace in a more controlled way (plus if it fails, it tells you the reproducer).

This has a fixed seed. The point is that instead of exhaustively checking a small corner of the space (e.g. with one type of matrix), it is a (deterministic) uniform sample of the space.

If the choice of the name rnd is confusing, I had changed it to the more familiar rng, but then changed it back because it is named that throughout the file. I can change all 20 occurrences if you prefer.

Ah, sorry about the noise then. And yes indeed, you guessed right - I read this wrong because of rnd (and maybe "flip a coin" suggested randomness). But it's fine to leave as is I'd say.

mdhaber · 2024-04-25T16:37:56Z

scipy/optimize/tests/test_minimize_constrained.py

+            pytest.skip("Numerical Hessian needs analytical gradient")
+        if prob.grad is True and grad in {'3-point', False}:
+            pytest.skip("prob.grad incompatible with grad in {'3-point', False}")
+        sensitive = (isinstance(prob, BoundedRosenbrock) and grad == '3-point'


This part of the diff skips the problematic test case so that CI would not fail anymore. The proposal is to merge the PR with the problematic test skipped, then I'll open a separate issue about the failure.
It doesn't seem like something that should hold up this PR - the failure appears to be caused by a minor change in how the test is configured, and it seems to be exposing an issue that has always been here; we might have just been lucky/unlucky not to see before.

rgommers

Okay, let's give this a go, it's been reviewed in enough detail. Thanks Matt!

TST: adjust other very slow tests

4cad0a3

mdhaber requested review from person142, steppi, tylerjereddy, peterbell10, perimosocordiae, andyfaff, ev-br, larsoner and ilayn as code owners April 16, 2024 05:17

github-actions bot added scipy.special scipy.sparse.linalg scipy.linalg scipy.sparse scipy.optimize scipy.interpolate scipy.integrate scipy._lib scipy.spatial maintenance Items related to regular maintenance tasks labels Apr 16, 2024

lucascolley removed scipy.special scipy.sparse.linalg scipy.linalg scipy.sparse scipy.optimize scipy.interpolate scipy.integrate scipy._lib scipy.spatial labels Apr 16, 2024

rgommers requested changes Apr 16, 2024

View reviewed changes

mdhaber commented Apr 16, 2024

View reviewed changes

mdhaber commented Apr 17, 2024

View reviewed changes

scipy/optimize/tests/test_minimize_constrained.py Show resolved Hide resolved

Apply suggestions from code review

8c99cc5

mdhaber commented Apr 19, 2024

View reviewed changes

scipy/optimize/tests/test_minimize_constrained.py Outdated Show resolved Hide resolved

mdhaber added 4 commits April 18, 2024 18:38

Update scipy/optimize/tests/test_minimize_constrained.py

e7d8de1

[skip cirrus] [skip circle]

TST: optimize.minimize: refactor test_list_of_problems

531b48f

[skip cirrus] [skip circle]

TST: optimize.minimize: add x0 for BoundedRosenbrock test

6429a5f

[skip cirrus] [skip circle]

TST: optimize.minimize: xfail flaky test

45ffa3f

mdhaber commented Apr 19, 2024

View reviewed changes

rgommers approved these changes Apr 19, 2024

View reviewed changes

Apply suggestions from code review

7041361

[skip ci]

mdhaber commented Apr 19, 2024

View reviewed changes

scipy/optimize/tests/test_extending.py Outdated Show resolved Hide resolved

Apply suggestions from code review

a960ccf

[skip ci]

tylerjereddy added this to the 1.14.0 milestone Apr 19, 2024

mdhaber commented Apr 21, 2024

View reviewed changes

scipy/spatial/tests/test_qhull.py Outdated Show resolved Hide resolved

Apply suggestions from code review

ad52e0e

[skip ci]

mdhaber requested a review from rgommers April 23, 2024 17:32

rgommers reviewed Apr 25, 2024

View reviewed changes

TST: sparse.linalg.lobpcg: run fewer tests

5f11c95

mdhaber commented Apr 25, 2024

View reviewed changes

mdhaber requested a review from rgommers April 29, 2024 05:53

rgommers approved these changes Apr 30, 2024

View reviewed changes

rgommers merged commit 713bce9 into scipy:main Apr 30, 2024
30 checks passed

mdhaber mentioned this pull request Apr 30, 2024

CI: fail slow tests #20480

Merged

		@@ -89,6 +89,7 @@ def test_svdp(ctor, dtype, irl, which):
		check_svdp(n, m, ctor, dtype, k, irl, which)


		@pytest.mark.xslow

TST: adjust other very slow tests #20487

TST: adjust other very slow tests #20487

Conversation

mdhaber commented Apr 16, 2024 • edited

Reference issue

What does this implement/fix?

Additional information

rgommers left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ev-br commented Apr 16, 2024 • edited

mdhaber left a comment

Choose a reason for hiding this comment

mdhaber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgommers commented Apr 19, 2024

tylerjereddy commented Apr 19, 2024

mdhaber commented Apr 21, 2024 • edited

rgommers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdhaber Apr 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgommers left a comment

Choose a reason for hiding this comment

mdhaber commented Apr 16, 2024 •

edited

rgommers left a comment •

edited

ev-br commented Apr 16, 2024 •

edited

mdhaber commented Apr 21, 2024 •

edited

mdhaber Apr 30, 2024 •

edited