How should our test matrix look like? #6085

fjetter · 2022-04-07T16:55:28Z

There have been a few conversations about how our test matrix should look like. There are clearly costs and benefits to all configurations and I think we should settle what our strategy is to not have individual discussions on tickets

Related tickets

One proposal

Currently we have:

Python version	OS	Which deps	deps version
3.8	Linux, MacOSX, Windows	mandatory, optional, Cython	latest
3.9	Linux, MacOSX, Windows	mandatory, optional, CUDA	latest
3.10	Linux, MacOSX, Windows	mandatory, optional	latest+git tip

I think we should change it to:

Python version	OS	Which deps	deps version
3.8	Linux	mandatory, optional, CUDA	pinned to minimum
3.8	Linux	mandatory only (no numpy!)	latest
3.8	Linux, MacOSX, Windows	mandatory, optional, CUDA	latest
3.9	Linux	mandatory, optional, CUDA	latest
3.10	Linux, MacOSX, Windows	mandatory, optional, CUDA	latest+git tip

That's 9 workflows before and after, but with one less Mac workflow and a lot of added value.
I expect the first two workflows to require a lot of effort before they become green.

Originally posted by @crusaderky in #6073 (comment)

The text was updated successfully, but these errors were encountered:

fjetter · 2022-04-07T16:56:27Z

repost: #6073 (comment)

Do we need CUDA/pynvml everywhere? Isn't this what gpuCI is for?

fjetter · 2022-04-07T17:02:06Z

Another thing we might want to consider is to mark a few test modules that have a chance to be OS sensitive as such and only run them on Win and OSX which would cut the runtime significantly.

Relevant modules would be

Everything around comms
deploy (process heavy)
CLI
Wherever we fiddle with low level asyncio code (I think the asyncio_comm is the only module that qualifies for this right now from my POV)
Maybe protocol (memory allocation tricks)
Maybe there are a few special cases?

I think most of the logic we care for in our tests is entirely unrelated to the OS

fjetter · 2022-04-07T17:20:46Z

I would actually like to see us dropping OSX coverage to either a subset of the test suite or to one python version only. The entire org-level constraint is to have at most 5 concurrent OSX builds and particularly during US hours we have PRs waiting for hours before they can even pick up an OSX job

crusaderky · 2022-04-08T10:39:10Z

Everything around spilling is potentially impacted by lz4 being installed or not

mrocklin · 2022-04-11T14:09:57Z

Do we need CUDA/pynvml everywhere? Isn't this what gpuCI is for?

cc @jakirkham @quasiben @charlesbluca

mrocklin · 2022-04-11T14:12:20Z

Another thing we might want to consider is to mark a few test modules that have a chance to be OS sensitive as such and only run them on Win and OSX which would cut the runtime significantly.

Windows is pickier when it comes to network and disk. I've found this to be valuable in surprising situations. OSX and Linux are correlated enough that I agree we don't necessarily need to duplicate across them.

I'd love to see us run the entire test suite on every OS for at least one configuration (but not all).

jakirkham · 2022-04-11T19:15:00Z

cc @dask/gpu

jakirkham · 2022-04-11T19:21:47Z

Do we need CUDA/pynvml everywhere? Isn't this what gpuCI is for?

Should add I don't really understand the question. I do see pynvml in the CI requirements, which is pretty lightweight. Nothing else stuck out to me as "CUDA". What is meant by that? When asking for help, it would be useful if people can formulate the question clearly as well as explain the problem, neither of which is clear to me.

quasiben · 2022-04-11T19:56:22Z

I don't think we need CUDA/pynvml everywhere. I believe we are only running on Linux/Python 3.9/CUDA 11.5:
https://github.com/dask/distributed/blob/main/continuous_integration/gpuci/axis.yaml . @charlesbluca has been a great steward of updating gpuCI for CUDA/RAPIDS updates as new releases come in.

crusaderky · 2022-04-11T21:23:19Z

Do we need CUDA/pynvml everywhere? Isn't this what gpuCI is for?

Should add I don't really understand the question. I do see pynvml in the CI requirements, which is pretty lightweight. Nothing else stuck out to me as "CUDA". What is meant by that? When asking for help, it would be useful if people can formulate the question clearly as well as explain the problem, neither of which is clear to me.

CUDA-related packages, meaning pynvml, pytorch, and tochvision.
They are currently installed on selected Github CI environments, despite the fact that those hosts don't mount NVIDIA cards.

jakirkham · 2022-04-11T21:32:23Z

So pynvml is definitely CUDA related. My guess is this is not needed or used in CPU, but could be wrong (we could be testing that pynvml support fails gracefully on CPU only for example). Though would ask @jacobtomlinson and @rjzamora to confirm.

This may come as a surprise, but pytorch and torchvision are not being used on GPU currently. They are only used on CPU for testing this serialization logic. Also please see this torchvision test. These libraries can in fact be used on CPU only and that seems to be what is happening here.

Should add this is exactly why I asked this question. Was concerned there may be misconceptions about what is being used where.

To the second question, is there an issue with having these tested? What brought this up?

charlesbluca · 2022-04-11T21:59:51Z

Do we need CUDA/pynvml everywhere? Isn't this what gpuCI is for?

I would prefer keeping pynvml in all testing environments - like @jakirkham said, it is relatively lightweight, and I think it's important to cover the (rare, but plausible) case where a user has pynvml installed but no GPUs on their system to make sure that monitoring still works properly.

My interpretation of the proposed testing matrix is that we would like to expand the current gpuCI matrix in Distributed to include additional Python versions / OSes - some comments on this:

python 3.10 testing support would be limited if possible, as RAPIDS currently only supports 3.8 / 3.9
in terms of OS, gpuCI is limited to Linux, though I think Windows support through WSL2 is being explored; I don't see much value in testing on pure Windows / macOS as RAPIDS currently doesn't support them

fjetter · 2022-04-14T09:34:28Z

I just want to clarify why I opened this ticket and why I'm looking for a "minimal dependency" build or why I'm suggesting to drop some packages from certain builds. I have the feeling that people are feeling the need to "defend" pynvml. This was not my intention.

High level motivation

CI runtimes are quite long and this is increasing iteration cycles. That's not the fault of the dependencies themselves but I'm wondering if there is anything we can "not test" and still be confident we're covered for what matters.
CI is often bottlenecked by OSX jobs because the entire dask github organization is sharing five concurrent runners (Drop OSX builds for python 3.9 #6073)
When dropping py3.7 and introducing py3.10 we had discussions in both cases about how the test matrix should look like. I noticed that nobody knew exactly why it is the way it is and where and why we made a decision about it. I am looking for clarification.

OS jobs

OSX jobs are limited to five concurrent jobs for the entire dask organization. Nowadays, we have much more people working on the project and this limitation is starting to hurt
Windows jobs are known to be slower than ubuntu jobs.
Most of our tests are not incredibly sensitive to OS specifics with exceptions of networking, processes, etc. Can we leverage this knowledge somehow?

If we could reduce what we test on these OSs we'd get faster iteration times for everyone.

Optional dependencies

Installing optional dependencies costs. We need to solve the environment, download and install the packages, we run additional tests, etc. It's not a huge thing but it adds up.
For example do we need scikit-learn, scipy in all environments (that's already 100MB)? Do we really want to install s3fs in every environment?
I like the idea of testing a library with its bare minimal runtime dependencies. I consider this a best practice to ensure that we're not accidentally hitting an import error or something like this. I don't think we have this problem right now but it's a nice to have nonetheless.
I'm sorry If I offended anyone by suggesting to drop CUDA/pynvml. I barely know what pynvml does. Thanks @charlesbluca for pointing out that there is even some non-GPU monitoring going on and we should ensure this doesn't crash if GPUs are not around. This is the feedback I was looking for. If you tell me "we need this everywhere", I'll ask why because I want to understand the motivation but I will obviously trust the experts on this.
It appears "CUDA" was a wrong label and we're rather talking about "ML stuff"? I see a few different categories, e.g. ml (pytorch, scikit-learn, scipy, keras, ...), gpu (maybe only pynvml??), io (s3, fsspec, h5py, netcdf4, ux/misc (prometheus, bokeh, compression stuff?, SSH, crick).

jacobtomlinson · 2022-05-06T14:33:22Z

I think it's important to cover the (rare, but plausible) case where a user has pynvml installed but no GPUs on their system

In my experience it's not that rare. Particularly on HPC environments where folks have one conda environment and select different hardware nodes for different tasks.

jakirkham · 2022-05-06T19:18:17Z

Thanks Florian! 🙏

So originally pytorch and torchvision were in a single CI job. In fact that may be the case with several of these. My guess is as we have dropped and added new Python versions to the test matrix things got unintentionally copied over, which has resulted in these showing up in several jobs. It probably makes sense to limit these back to one environment.

That all being said, these may wind up getting copied over to new environments again unintentionally. So it might be worth thinking of a better system to manage optional dependency testing than what we have today, which is less prone to this kind of issue. For example having a separate requirement list of optional dependencies that we install after creating the Conda environment and then only install them when some CI environment variable is set (CI_OPTIONAL_REQS or similar). We could also break them into smaller groups if that is helpful.

fjetter added the discussion Discussing a topic with no specific actions yet label Apr 7, 2022

fjetter mentioned this issue Apr 7, 2022

Drop OSX builds for python 3.9 #6073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should our test matrix look like? #6085

How should our test matrix look like? #6085

fjetter commented Apr 7, 2022

fjetter commented Apr 7, 2022

fjetter commented Apr 7, 2022

fjetter commented Apr 7, 2022

crusaderky commented Apr 8, 2022

mrocklin commented Apr 11, 2022

mrocklin commented Apr 11, 2022

jakirkham commented Apr 11, 2022

jakirkham commented Apr 11, 2022

quasiben commented Apr 11, 2022

crusaderky commented Apr 11, 2022

jakirkham commented Apr 11, 2022

charlesbluca commented Apr 11, 2022

fjetter commented Apr 14, 2022

jacobtomlinson commented May 6, 2022 •

edited

jakirkham commented May 6, 2022

How should our test matrix look like? #6085

How should our test matrix look like? #6085

Comments

fjetter commented Apr 7, 2022

Related tickets

fjetter commented Apr 7, 2022

fjetter commented Apr 7, 2022

fjetter commented Apr 7, 2022

crusaderky commented Apr 8, 2022

mrocklin commented Apr 11, 2022

mrocklin commented Apr 11, 2022

jakirkham commented Apr 11, 2022

jakirkham commented Apr 11, 2022

quasiben commented Apr 11, 2022

crusaderky commented Apr 11, 2022

jakirkham commented Apr 11, 2022

charlesbluca commented Apr 11, 2022

fjetter commented Apr 14, 2022

High level motivation

OS jobs

Optional dependencies

jacobtomlinson commented May 6, 2022 • edited

jakirkham commented May 6, 2022

jacobtomlinson commented May 6, 2022 •

edited