float16 normalization problem with numpy >= 1.21 #21559

jmhessel · 2021-11-04T23:23:51Z

Describe the bug

Hi!

I think sklearn.preprocessing.normalize doesn't work properly on float16 for the latest versions of numpy. I am not sure if this is a bug with numpy or sklearn, but figured folks might want to know. I think for dense matrices, np.einsum is used to compute norms, so the instability might be there. Regardless...

Steps/Code to Reproduce

import numpy as np
import sklearn
print(sklearn.__version__)

## if numpy is 1.20.3 or below, this works
## if numpy is 1.21.X or above, this breaks
print(np.__version__)
import sklearn.preprocessing

np.random.seed(1)
test = np.random.random((1,512)).astype(np.float16)
test_norm = sklearn.preprocessing.normalize(test)
print(np.linalg.norm(test_norm))

#output for sklearn 1.0.1 and numpy 1.21.2 (anything 1.21.+)
#1.0.1
#1.21.2
#13.25

#output for sklearn 1.0.1 and numpy 1.20.3
#1.0.1
#1.20.3
#1.0

Expected Results

regardless of numpy version, np.linalg.norm should be close to 1 after normalization

Actual Results

the norm is 13 for any numpy 1.21.+

Versions

see above --- I'm using the latest sklearn, but if i also use the latest numpy, float16 normalization no longer seems to work.

The text was updated successfully, but these errors were encountered:

ogrisel · 2021-11-05T08:45:36Z

It's bad that we did not have a test to catch this regression earlier. I confirm I can reproduce the exact same results on macos/arm64, so this is not platform specific.

It seems that with numpy 1.21.2 sklearn.preprocessing.normalize(test) returns test without any modification...

ogrisel · 2021-11-05T08:48:26Z

On numpy 1.21.2:

>>> np.einsum("ij,ij->i", test, test)
array([0.], dtype=float16)

while on previous numpy:

>>> np.einsum("ij,ij->i", test, test)
array([175.6], dtype=float16)

and on both numpy versions:

>>> (test ** 2).sum()
175.6

so there is a regression in np.einsum.

ogrisel · 2021-11-05T08:59:52Z

I found a minimal reproducer, I will report it upstream.

ogrisel · 2021-11-05T09:00:59Z

This is seems to have already been reported as: numpy/numpy#20305.

jmhessel · 2021-11-05T16:51:42Z

Thanks @ogrisel ! Should I close this issue out because it's numpy? or keep it open ?

glemaitre · 2021-11-07T11:38:17Z

Thanks @ogrisel ! Should I close this issue out because it's numpy? or keep it open ?

We can keep it open just to track the regression.

thomasjpfan · 2021-11-26T20:35:20Z

Looks like the bug was fixed in numpy and will be backed ported in numpy 1.21.5

XREF: numpy/numpy#20462

thomasjpfan · 2022-03-29T02:45:45Z

Closing because this issue was fixed in NumPy.

jmhessel added the Bug: triage label Nov 4, 2021

ogrisel added Bug and removed Bug: triage labels Nov 5, 2021

jmhessel mentioned this issue Nov 5, 2021

Difference in CPU/GPU results larger than expected jmhessel/clipscore#2

Closed

thomasjpfan closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float16 normalization problem with numpy >= 1.21 #21559

float16 normalization problem with numpy >= 1.21 #21559

jmhessel commented Nov 4, 2021 •

edited

ogrisel commented Nov 5, 2021

ogrisel commented Nov 5, 2021 •

edited

ogrisel commented Nov 5, 2021

ogrisel commented Nov 5, 2021

jmhessel commented Nov 5, 2021

glemaitre commented Nov 7, 2021

thomasjpfan commented Nov 26, 2021

thomasjpfan commented Mar 29, 2022

float16 normalization problem with numpy >= 1.21 #21559

float16 normalization problem with numpy >= 1.21 #21559

Comments

jmhessel commented Nov 4, 2021 • edited

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

ogrisel commented Nov 5, 2021

ogrisel commented Nov 5, 2021 • edited

ogrisel commented Nov 5, 2021

ogrisel commented Nov 5, 2021

jmhessel commented Nov 5, 2021

glemaitre commented Nov 7, 2021

thomasjpfan commented Nov 26, 2021

thomasjpfan commented Mar 29, 2022

jmhessel commented Nov 4, 2021 •

edited

ogrisel commented Nov 5, 2021 •

edited