Test summary with previous PyTorch/TensorFlow versions #18181

ydshieh · 2022-07-18T12:51:37Z

Initialized by @LysandreJik, we ran the tests with previous PyTorch/TensorFlow versions. The goal is to determine if we should drop (some) earlier PyTorch/TensorFlow versions.

This is not exactly the same as the scheduled daily CI (torch-scatter, accelerate not installed, etc.)
Currently we only have the global summary (i.e. there is no number of test failures per model)

Here is the results (running on ~June 20, 2022):

PyTorch testing has ~27100 tests
TensorFlow testing has ~15700 tests

Framework	No. Failures
PyTorch 1.10	50
PyTorch 1.9	710
PyTorch 1.8	1301
PyTorch 1.7	1567
PyTorch 1.6	2342
PyTorch 1.5	3315
PyTorch 1.4	3949
TensorFlow 2.8	118
TensorFlow 2.7	122
TensorFlow 2.6	122
TensorFlow 2.5	128
TensorFlow 2.4	167

It looks like the number of failures in TensorFlow testing doesn't increase much.

So far my thoughts:

All TF >= 2.4 should be (still) kept in the list of supported versions

Questions

What's you opinion regarding which versions to drop support?
Would you like to see the number of test failures per model?
TensorFlow 2.3 needs CUDA 10.1 and requires the build of a special docker image. Do you think we should make the effort on it to have the results for TF 2.3?

The text was updated successfully, but these errors were encountered:

ydshieh · 2022-07-18T13:07:12Z

cc @LysandreJik @sgugger @patrickvonplaten @Rocketknight1 @gante @anton-l @NielsRogge @amyeroberts @alaradirik @stas00 @hollance to have your comments

Rocketknight1 · 2022-07-18T13:16:58Z

TF 2.3 is quite old by now, and I wouldn't make a special effort to support it. Several nice TF features (like the Numpy-like API) only arrived in TF 2.4, and we're likely to use those a lot in future.

LysandreJik · 2022-07-19T13:22:11Z

Hey @ydshieh, would you have a summary of the failing tests handy? I'm curious to see the reason why there are so many failures for PyTorch as soon as we leave the latest version. I'm quite confident that it's an issue in our tests rather than in our internal code, so seeing the failures would help. Thanks!

ydshieh · 2022-07-19T13:39:57Z

@LysandreJik I will re-run it. The previous run(s) have huge tables in the reports, and sending to Slack failed (3001 character limit). I finally ran it by disabling those blocks.

Before re-running it, I need a approve for #17921

ydshieh · 2022-08-01T13:24:41Z

I ran the past CI again which returns more information. Looking the report for PyTorch 1.4 quickly, here are some observations:

There is one error occurring in almost all models:

from_pretrained: OSError: Unable to load weights from pytorch checkpoint file for`
- torch.load: Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old.

Another one also occurs a lot (torchscript tests)

(line 625) AttributeError: module 'torch.jit' has no attribute '_state'

An error occurs (specifically) to vision models (probably due to the convolution layers)

(line 97) RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

BART has 108/106 failures:

(line 240) RuntimeError: CUDA error: device-side assert triggered
- Don't know what's wrong here yet

Others

Other AttributeError: (not exhaustive)
- AttributeError: module 'torch' has no attribute 'minimum'
- AttributeError: 'builtin_function_or_method' object has no attribute 'fftn'
- AttributeError: module 'torch' has no attribute 'square'
- AttributeError: module 'torch.nn' has no attribute 'Hardswish'
- AttributeError: module 'torch' has no attribute 'logical_and'
- AttributeError: module 'torch' has no attribute 'pi'
- AttributeError: module 'torch' has no attribute 'multiply'

LysandreJik · 2022-08-09T10:37:26Z

Thanks for the report! Taking a look at the PyTorch versions, here are the dates at which they were releases:

Most of the errors in from_pretrained seem to come from the zipfile format introduced by PyTorch 1.6. I think this is the most annoying one to patch by far.

From a first look, I'd offer to drop support for all PyTorch version inferior to < 1.6 as these have been released more than two years ago.

Do you have a link to a job containing all these failures? I'd be interested in seeing if the 2342 errors in PyTorch 1.6 are solvable simply or if they will require a significant refactor.

ydshieh · 2022-08-09T11:11:21Z

The link is here. But since it contains too many jobs (all models x all versions ~= 3200 jobs), it just shows [Unicorn!] This page is taking too long to load.

I can re-run specifically for PyTorch 1.6 only, and will post a link later.

stas00 · 2022-08-09T17:13:53Z

From a first look, I'd offer to drop support for all PyTorch version inferior to < 1.6 as these have been released more than two years ago.

I second that.

While we are at it, do we want to establish an official shifting window of how far back we want to support pytorch versions for? As in minimum - we support at least 2 years of pytorch? If it's easy to support longer we would but it'd be easy to cut off if need be.

The user always has the older transformers that they can pin to if they really need a very old pytorch support.

LysandreJik · 2022-08-10T07:27:20Z

Yes, that would work fine with me. If I understand correctly, that's how libraries in the PyData ecosystem (scikit-learn, numpy) manage the support of Python versions: they drop support for versions older than 2 years (scikit-learn/scikit-learn#20965, scikit-learn/scikit-learn#20084, scipy toolchaib, scipy/scipy#14655).

Dropping support for PyTorch/Flax/TensorFlow versions that have been released more than two years ago sounds good to me. That is somewhat already the case (see failing tests), but we're just not aware.

ydshieh · 2022-08-10T07:47:47Z

Hi, I am wondering what it means a PyTorch/TensorFlow/Flax version is supported. I guess it doesn't imply all models work under those framework versions, but would like to know if there is more explicit definition (for transformers, or more generally, in open source projects).

sgugger · 2022-08-10T12:20:34Z

Ideally it should mean that all models work/all tests pass apart from functionality explicitly having versions tests (like CUDA bfloat16 or torch FX where we test against a specific PyTorch version).

ydshieh added bug Tests Related to tests and removed bug labels Jul 18, 2022

ydshieh mentioned this issue Jul 18, 2022

Enable Past CI #17919

Merged

ydshieh mentioned this issue Jul 26, 2022

Add PyTorch 1.11 to past CI #18302

Merged

LysandreJik mentioned this issue Aug 30, 2022

Identifying backend compatibility versions #18817

Open

16 tasks

huggingface deleted a comment from github-actions bot Sep 5, 2022

ydshieh added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test summary with previous PyTorch/TensorFlow versions #18181

Test summary with previous PyTorch/TensorFlow versions #18181

ydshieh commented Jul 18, 2022 •

edited

ydshieh commented Jul 18, 2022

Rocketknight1 commented Jul 18, 2022

LysandreJik commented Jul 19, 2022

ydshieh commented Jul 19, 2022

ydshieh commented Aug 1, 2022 •

edited

LysandreJik commented Aug 9, 2022

ydshieh commented Aug 9, 2022

stas00 commented Aug 9, 2022

LysandreJik commented Aug 10, 2022

ydshieh commented Aug 10, 2022

sgugger commented Aug 10, 2022

Test summary with previous PyTorch/TensorFlow versions #18181

Test summary with previous PyTorch/TensorFlow versions #18181

Comments

ydshieh commented Jul 18, 2022 • edited

So far my thoughts:

Questions

ydshieh commented Jul 18, 2022

Rocketknight1 commented Jul 18, 2022

LysandreJik commented Jul 19, 2022

ydshieh commented Jul 19, 2022

ydshieh commented Aug 1, 2022 • edited

LysandreJik commented Aug 9, 2022

ydshieh commented Aug 9, 2022

stas00 commented Aug 9, 2022

LysandreJik commented Aug 10, 2022

ydshieh commented Aug 10, 2022

sgugger commented Aug 10, 2022

ydshieh commented Jul 18, 2022 •

edited

ydshieh commented Aug 1, 2022 •

edited