-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should our test matrix look like? #6085
Comments
repost: #6073 (comment) Do we need CUDA/pynvml everywhere? Isn't this what gpuCI is for? |
Another thing we might want to consider is to mark a few test modules that have a chance to be OS sensitive as such and only run them on Win and OSX which would cut the runtime significantly. Relevant modules would be
I think most of the logic we care for in our tests is entirely unrelated to the OS |
I would actually like to see us dropping OSX coverage to either a subset of the test suite or to one python version only. The entire org-level constraint is to have at most 5 concurrent OSX builds and particularly during US hours we have PRs waiting for hours before they can even pick up an OSX job |
Everything around spilling is potentially impacted by lz4 being installed or not |
|
Windows is pickier when it comes to network and disk. I've found this to be valuable in surprising situations. OSX and Linux are correlated enough that I agree we don't necessarily need to duplicate across them. I'd love to see us run the entire test suite on every OS for at least one configuration (but not all). |
cc @dask/gpu |
Should add I don't really understand the question. I do see |
I don't think we need CUDA/pynvml everywhere. I believe we are only running on Linux/Python 3.9/CUDA 11.5: |
CUDA-related packages, meaning pynvml, pytorch, and tochvision. |
So pynvml is definitely CUDA related. My guess is this is not needed or used in CPU, but could be wrong (we could be testing that pynvml support fails gracefully on CPU only for example). Though would ask @jacobtomlinson and @rjzamora to confirm. This may come as a surprise, but pytorch and torchvision are not being used on GPU currently. They are only used on CPU for testing this serialization logic. Also please see this torchvision test. These libraries can in fact be used on CPU only and that seems to be what is happening here. Should add this is exactly why I asked this question. Was concerned there may be misconceptions about what is being used where. To the second question, is there an issue with having these tested? What brought this up? |
I would prefer keeping pynvml in all testing environments - like @jakirkham said, it is relatively lightweight, and I think it's important to cover the (rare, but plausible) case where a user has pynvml installed but no GPUs on their system to make sure that monitoring still works properly. My interpretation of the proposed testing matrix is that we would like to expand the current gpuCI matrix in Distributed to include additional Python versions / OSes - some comments on this:
|
I just want to clarify why I opened this ticket and why I'm looking for a "minimal dependency" build or why I'm suggesting to drop some packages from certain builds. I have the feeling that people are feeling the need to "defend" pynvml. This was not my intention. High level motivation
OS jobs
If we could reduce what we test on these OSs we'd get faster iteration times for everyone. Optional dependencies
|
In my experience it's not that rare. Particularly on HPC environments where folks have one conda environment and select different hardware nodes for different tasks. |
Thanks Florian! 🙏 So originally That all being said, these may wind up getting copied over to new environments again unintentionally. So it might be worth thinking of a better system to manage optional dependency testing than what we have today, which is less prone to this kind of issue. For example having a separate requirement list of optional dependencies that we install after creating the Conda environment and then only install them when some CI environment variable is set ( |
There have been a few conversations about how our test matrix should look like. There are clearly costs and benefits to all configurations and I think we should settle what our strategy is to not have individual discussions on tickets
Related tickets
bokeh=3
#5648One proposal
Currently we have:
I think we should change it to:
That's 9 workflows before and after, but with one less Mac workflow and a lot of added value.
I expect the first two workflows to require a lot of effort before they become green.
Originally posted by @crusaderky in #6073 (comment)
The text was updated successfully, but these errors were encountered: