Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failures #9793

Closed
mrocklin opened this issue Dec 30, 2022 · 16 comments
Closed

CI Failures #9793

mrocklin opened this issue Dec 30, 2022 · 16 comments
Assignees
Labels
tests Unit tests and/or continuous integration

Comments

@mrocklin
Copy link
Member

These two tests have started failing semi-regularly.

dask/array/tests/test_array_core.py::test_setitem_extended_API_2d_mask[index1-value1]
dask/tests/test_distributed.py::test_blockwise_dataframe_io[False-False-hdf] 

https://github.com/dask/dask/actions/runs/3796303134/jobs/6467508525
https://github.com/dask/dask/actions/runs/3796303134/jobs/6467508474

They're failling on Mac, and so should be easy to reproduce. I haven't been able to just yet though.

cc @jrbourbeau when you get back. Maybe also @rjzamora and @fjetter (and team) can take a look when they get back?

@github-actions github-actions bot added the needs triage Needs a response from a contributor label Dec 30, 2022
@fjetter
Copy link
Member

fjetter commented Jan 2, 2023

I couldn't reproduce the failures yet on any python version (tried 3.9, 3.10 and 3.11 on OSX 12.6 M1). @hendrikmakait can you try on your machine as well?

@mrocklin
Copy link
Member Author

mrocklin commented Jan 2, 2023

I also tried on my MBP a few days ago and couldn't reproduce.

@fjetter
Copy link
Member

fjetter commented Jan 3, 2023

Looks like the failure in dask/tests/test_distributed.py::test_blockwise_dataframe_io[False-False-hdf] is either already fixed in an upstream lib or it is super rare. I couldn't find a second failure and couldn't reproduce myself.

I believe we're not generating a test report for dask/dask as we are for dask/distributed, are we?

@bnavigator
Copy link
Contributor

bnavigator commented Jan 4, 2023

I am getting the test_setitem_extended_API_2d_mask error occasionally on the build servers for openSUSE Tumbleweed but can't reproduce locally either.

dask-issue9793.txt

@mrocklin
Copy link
Member Author

mrocklin commented Jan 5, 2023

I believe we're not generating a test report for dask/dask as we are for dask/distributed, are we?

I don't know

@mrocklin
Copy link
Member Author

mrocklin commented Jan 9, 2023

It looks like test_setitem_extended_API_2d_mask[index1-value1] is still failing in the wild. @fjetter can your team help here?

https://github.com/dask/dask/actions/runs/3875100237/jobs/6607170855

@mrocklin
Copy link
Member Author

mrocklin commented Jan 9, 2023

Another failure here: dask/tests/test_threaded.py::test_interrupt

https://github.com/dask/dask/actions/runs/3875100237/jobs/6607171855

@fjetter
Copy link
Member

fjetter commented Jan 12, 2023

@hendrikmakait will have a look

@hendrikmakait
Copy link
Member

I couldn't reproduce test_setitem_extended_API_2d_mask[index1-value1] locally in 10,000 runs. @jrbourbeau, would you happen to have an idea before I start debugging in GH Actions?

@hendrikmakait hendrikmakait self-assigned this Jan 12, 2023
@jrbourbeau
Copy link
Member

Yeah, my guess is the RuntimeWarning: invalid value encountered in cast warning is coming from this change in the latest numpy=1.24 release https://numpy.org/doc/stable/release/1.24.0-notes.html#numpy-now-gives-floating-point-errors-in-casts. Just double checking, what version of numpy do you have locally?

@hendrikmakait
Copy link
Member

@jrbourbeau: I have numpy==1.24.1, which seems to be the same as in https://github.com/dask/dask/actions/runs/3875100237/jobs/6607170855#step:5:221

@jrbourbeau
Copy link
Member

Hmm thanks for checking -- I'm not able to reproduce either. I'm suggesting we temporarily ignore the warning in #9828. Also have filed an upstream issue in numpy numpy/numpy#23000 (since it issue appears to be originating there)

@hendrikmakait
Copy link
Member

dask/dataframe/tests/test_groupby.py::test_dataframe_aggregations_multilevel fails with multiple parametrizations:

xref: #9701

@hendrikmakait
Copy link
Member

@rjzamora: I see you linked a (now closed) PR regarding dask/tests/test_distributed.py::test_blockwise_dataframe_io[False-False-hdf]. Are you taking this one on?

@rjzamora
Copy link
Member

@rjzamora: I see you linked a (now closed) PR regarding dask/tests/test_distributed.py::test_blockwise_dataframe_io[False-False-hdf].

This failure has certainly been annoying me for a while now. However, I have never been able to reproduce locally. One theory I had was that we were simply reading back a cached version of the h5 file (one that was still missing the data for that last partition), because the file hadn't been fully flushed to disk between the write and read. My assumption was that my local SSD is just much faster than whatever disk is being used by CI. Therefore, in #9829, I tried running os.sync on the workers between the write and the read operations. Unfortunately, the failure still showed up, so I'm not entirely sure what to try next.

Are you taking this one on?

Not being able to reproduce the error locally makes the failure a bit tricky to understand. I will probably continue trying to figure out the root cause, but I can't say it will be much of a priority.

@jrbourbeau jrbourbeau added tests Unit tests and/or continuous integration and removed needs triage Needs a response from a contributor labels Jan 13, 2023
@fjetter
Copy link
Member

fjetter commented Jun 13, 2023

Closed by #9983

@fjetter fjetter closed this as completed Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Unit tests and/or continuous integration
Projects
None yet
Development

No branches or pull requests

6 participants