New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI, TST: pre-release failures for scipy/interpolate/tests/test_rbfinterp.py #14732
Comments
@treverhines @serge-sans-paille could you help here? I am not sure if these are related to Cython or Pythran. |
I'm confused about why it's just the pre-release jobs. Could be an interaction between NumPy master and Pythran 0.10.0. It's crashing on the last line of: def test_pickleable(self):
# Make sure we can pickle and unpickle the interpolant without any
# changes in the behavior.
seq = Halton(1, scramble=False, seed=np.random.RandomState())
x = 3*seq.random(50)
xitp = 3*seq.random(50)
y = _1d_test_function(x)
interp = self.build(x, y)
yitp1 = interp(xitp)
yitp2 = pickle.loads(pickle.dumps(interp))(xitp) |
Is |
It's not, thanks for pointing that out. That won't be the root cause of the segfaults, but we should fix it. |
These are the two failing jobs: They're very similar, the only differences are:
I'm inclined to get rid of the 32-bit BLAS with fast tests, it doesn't really add anything. That cuts things down to a single failure and makes the test suite more orthogonal. |
See scipygh-14732 for context. The similarly named job `prerelease_deps_64bit_blas` covers what we remove here. There are other jobs that test with 32-bit BLAS on Linux. The job we leave runs the full test suite, the one that is removed here the fast test suite. This cuts down on CI time and also on duplicate failures. [skip github]
xref scipygh-14732. This test is crashing most of the time in two CI jobs, but not always. This doesn't fix the root cause, but should make the test more reproducible. [skip github]
It's a leftover from switching to NumPy 1.17. It was needed to initialize I was too slow 😅 just saw it's merged. But it's fine. |
Still crashing on one CI job, and I haven't yet been able to reproduce it locally with numpy master and pythran 0.10.0 |
* for about 3 weeks pre-release CI has been failing as described in scipygh-14732 * I bisected the pre-release versions of Cython `3.x` that are available on PyPI locally and found that the first version to reproduced a sefault for the command below is `3.0a5` `python runtests.py -t "scipy/interpolate/tests/test_rbfinterp.py" -- -n 2` * so, pin the max Cython version to `<3.0a5` for now (pip should still install stable release when `--pre` is not used in other Azure CI entries * obviously this is just a workaround--I'll open an upstream issue to see if there are any suspicions about a possible bug or something else we might do differently to avoid the segfault
Looks like it is the pre-release version of Cython causing the problem--I did a careful bisection locally (clean build each time with different Cython version). For more details: |
Avoided the failure now in CI, but leaving this open since we need to make sure it's fixed in Cython before the next release (or otherwise check the pinning for releases better, not just CI). |
Wait, We can probably fix the bug by getting rid of Once we fix up the test, we can drop the Cython pin. |
@rgommers anything I need to do there? |
Maybe checking if there's a pickle-related change in Pythran 0.10.0 which could possibly choke on Cython-generated |
To be fair, I was fully aware of the timeline when I was bisecting--my objective was to identify the alpha release of Cython that contained a feature that, when interacting with everything else, resulted in a segfault. I figured it would be helpful to pin below the first segfaulting alpha release of Cython so that SciPy or upstream developers could have access to the subset of Cython changes involved in the interaction. |
Okay good, I had completely missed that when merging your pinning PR. I do think filing a Cython issue may have been premature, since the issue is at least as likely to be in Pythran 0.10.0? |
@rgommers @serge-sans-paille and maybe @peterbell10 (?) I did some more debugging, this time using the workflow below, and it points to a problem happening inside the Cython-generated
|
I reproduced segfault with a clean build of SciPy with |
Well, in my hands |
Let me see if I can find something in a recent NumPy change that would explain the timing... |
Here is the result of a careful/painful bisection on NumPy commits with Pythran pinned to
|
For reference it looks like the crash is happening on scipy/scipy/spatial/ckdtree.pyx Line 1575 in 81e7c04
while handling The contents of I haven't investigated in detail but two relevant things may have happened around Cython 3.0a5:
|
Hmm, a few debug prints on
|
Next thought - does If so it probably exposes a (minor) Cython bug - Cython now tries to auto-add Edit: I'm increasingly sure that's it - Cython uses I'll submit a fix to Cython. But in principle almost anywhere Scipy uses |
@charris seems like enough of a reason to revert the guard cleanup in that particular case (with a comment to fix it in 1-3 years)? Anyone wants to make a PR ;)? |
I'll submit a PR to Scipy adding |
@seberg If this is only Cython 3 prereleases I'd be more inclined to fix Cython first, it just needs an |
Yeah, sounds good. I assume(d) older cython versions will also run into this issue and the |
Is there any other define we could look for that is part of a public API. We were just trying to look for some distinctive define to distinguish "real Numpy" from a user that's helpfully decided to call their own module "numpy" |
@da-woods maybe |
It should only be the Cython alpha releases (3.0a5 - whatever the next release is). Personally I'm fairly neutral on whether you should keep |
See numpy/numpy#20042 for a NumPy fix. |
@da-woods I think the new header guard is probably safer to use than the old one, the new name is less likely to be defined by random users. I made a PR, but if the problem only affects alpha Cython releases, I'd rather see it fixed upstream. |
Fixed by gh-14813, I'll try to backport parts of that instead of the Cython version pin |
This failure is showing up all over the place for Azure Pipelines pre-release runs. I don't know if this is just a blip because of a bleeding edge dependency, but it seems like the test suite produces a segfault after getting to 10-20 % progress and produces a traceback like the one below the fold. The traceback is so ridiculously long that I'll just mention that
scipy/interpolate/tests/test_rbfinterp.py", line 375 in test_pickleable
shows up in there.The text was updated successfully, but these errors were encountered: