Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float16 normalization problem with numpy >= 1.21 #21559

Closed
jmhessel opened this issue Nov 4, 2021 · 8 comments
Closed

float16 normalization problem with numpy >= 1.21 #21559

jmhessel opened this issue Nov 4, 2021 · 8 comments
Labels

Comments

@jmhessel
Copy link

jmhessel commented Nov 4, 2021

Describe the bug

Hi!

I think sklearn.preprocessing.normalize doesn't work properly on float16 for the latest versions of numpy. I am not sure if this is a bug with numpy or sklearn, but figured folks might want to know. I think for dense matrices, np.einsum is used to compute norms, so the instability might be there. Regardless...

Steps/Code to Reproduce

import numpy as np
import sklearn
print(sklearn.__version__)

## if numpy is 1.20.3 or below, this works
## if numpy is 1.21.X or above, this breaks
print(np.__version__)
import sklearn.preprocessing

np.random.seed(1)
test = np.random.random((1,512)).astype(np.float16)
test_norm = sklearn.preprocessing.normalize(test)
print(np.linalg.norm(test_norm))

#output for sklearn 1.0.1 and numpy 1.21.2 (anything 1.21.+)
#1.0.1
#1.21.2
#13.25

#output for sklearn 1.0.1 and numpy 1.20.3
#1.0.1
#1.20.3
#1.0

Expected Results

regardless of numpy version, np.linalg.norm should be close to 1 after normalization

Actual Results

the norm is 13 for any numpy 1.21.+

Versions

see above --- I'm using the latest sklearn, but if i also use the latest numpy, float16 normalization no longer seems to work.

@ogrisel
Copy link
Member

ogrisel commented Nov 5, 2021

It's bad that we did not have a test to catch this regression earlier. I confirm I can reproduce the exact same results on macos/arm64, so this is not platform specific.

It seems that with numpy 1.21.2 sklearn.preprocessing.normalize(test) returns test without any modification...

@ogrisel
Copy link
Member

ogrisel commented Nov 5, 2021

On numpy 1.21.2:

>>> np.einsum("ij,ij->i", test, test)
array([0.], dtype=float16)

while on previous numpy:

>>> np.einsum("ij,ij->i", test, test)
array([175.6], dtype=float16)

and on both numpy versions:

>>> (test ** 2).sum()
175.6

so there is a regression in np.einsum.

@ogrisel ogrisel added Bug and removed Bug: triage labels Nov 5, 2021
@ogrisel
Copy link
Member

ogrisel commented Nov 5, 2021

I found a minimal reproducer, I will report it upstream.

@ogrisel
Copy link
Member

ogrisel commented Nov 5, 2021

This is seems to have already been reported as: numpy/numpy#20305.

@jmhessel
Copy link
Author

jmhessel commented Nov 5, 2021

Thanks @ogrisel ! Should I close this issue out because it's numpy? or keep it open ?

@glemaitre
Copy link
Member

Thanks @ogrisel ! Should I close this issue out because it's numpy? or keep it open ?

We can keep it open just to track the regression.

@thomasjpfan
Copy link
Member

Looks like the bug was fixed in numpy and will be backed ported in numpy 1.21.5

XREF: numpy/numpy#20462

@thomasjpfan
Copy link
Member

Closing because this issue was fixed in NumPy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants