Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: NaT handling change 1.21.2 -> 1.21.3 triggering issue in np.median #20376

Closed
stuartarchibald opened this issue Nov 15, 2021 · 10 comments · Fixed by #23774
Closed

BUG: NaT handling change 1.21.2 -> 1.21.3 triggering issue in np.median #20376

stuartarchibald opened this issue Nov 15, 2021 · 10 comments · Fixed by #23774
Assignees

Comments

@stuartarchibald
Copy link
Contributor

stuartarchibald commented Nov 15, 2021

Describe the issue:

It looks like something changed between 1.21.2 and 1.21.3 that has lead to a change in NaT handling which triggers an issue/incorrect result in np.median.

EDIT (seberg): Note that this issue also affects np.quantile and np.percentile, an included NaT should lead to a NaT result. But it does not.

This was found whilst adapting https://github.com/numba/numba to accommodate NumPy 1.21.x series.

xref: numba/numba#7483 (comment)

CC @seberg

Reproduce the code example:

import numpy as np
print(np.__version__)

arr = np.arange(10).astype(dtype='m8[s]')
print(arr)
arr[arr.size // 2] = 'NaT'
print(arr)
print(np.median(arr))

Error message:

Running the reproducer on 1.21.2:

$ python sample.py
1.21.2
[0 1 2 3 4 5 6 7 8 9]
[    0     1     2     3     4 'NaT'     6     7     8     9]
3 seconds

Running the reproducer on 1.21.3:

1.21.3
[0 1 2 3 4 5 6 7 8 9]
[    0     1     2     3     4 'NaT'     6     7     8     9]
5 seconds


### NumPy/Python version information:

1.21.2 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:23:11)

and

1.21.3 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:23:11)

@stuartarchibald
Copy link
Contributor Author

For the removal of doubt. This also appears in 1.21.4:

$ python sample.py
1.21.4
[0 1 2 3 4 5 6 7 8 9]
[    0     1     2     3     4 'NaT'     6     7     8     9]
5 seconds

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Nov 15, 2021
@charris charris added this to the 1.21.5 release milestone Nov 15, 2021
@seberg
Copy link
Member

seberg commented Nov 15, 2021

Seems like NaT handling is broken! Most likely the sorting/partition changed subtly ending up with a different order (where both are incorrect).

(We did change the NaT sorting I think, but I don't expect in a minor release?)

@seberg seberg added the component: numpy.datetime64 (and timedelta64) label Nov 15, 2021
@seberg
Copy link
Member

seberg commented Nov 15, 2021

Ah, must be this PR: #20081 that changed it, but I think it was always broken, so doubt we can revert it.

@WarrenWeckesser
Copy link
Member

Shouldn't the result be NaT, just like np.median([1.0, 2.0, np.nan, 3.0, 4.0, 5.0]) returns nan?

stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Nov 16, 2021
stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Nov 16, 2021
esc pushed a commit to esc/numba that referenced this issue Nov 17, 2021
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this issue Jan 10, 2022
https://build.opensuse.org/request/show/945418
by user mcepl + dimstar_suse
- Numba <0.55 is not compatible with Python 3.10 or NumPy 1.22
  gh#numba/numba#7557
- Add test skip to numba-pr7483-numpy1_21.patch due to numpy update
  gh#numpy/numpy#20376
@seberg seberg modified the milestones: 1.22.4 release, 1.23.0 release Mar 9, 2022
@seberg
Copy link
Member

seberg commented May 4, 2022

@stuartarchibald is this issue important for you for the next release, or should we push this off? I am not sure right immediately that this was a worse result then before (rather than a different one)?

@stuartarchibald
Copy link
Contributor Author

@seberg many thanks for asking. Numba's test suite is patched against testing np.median with input containing NaT for NumPy 1.21 (and it will be for 1.22 too), so it won't fail in the test suite. The Numba implementation is now effectively not matching NumPy (different result!), which is against Numba's exact replication policy, so it would be nice to get this resolved, but is also not critical.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label May 10, 2022
@charris
Copy link
Member

charris commented May 10, 2022

I removed the backport tag as the last planned 1.22.x release will be this week. @stuartarchibald What behavior does Numba implement?

@stuartarchibald
Copy link
Contributor Author

@charris ok, thanks, also good to know that the 1.22 series ends planned releases this week.

What behavior does Numba implement?

Numba implements the pre-1.21.3 behaviour. I'm happy to patch Numba to switch implementation based on NumPy version (there's already quite a bit of this in the Numba code base). What helps with doing this is to have a defined "new" behaviour, but I gather from the above that the current state may be a side effect of #20081 opposed to a direct intention and other "answers" to the MWR in the OP may be more desirable?

@seberg
Copy link
Member

seberg commented May 10, 2022

Yeah, it seems pretty clear to me that the correct result would be NaT with np.nanmedian probably already returning a version which just ignores the NaT entries.
So if you want to get ahead of NumPy, that is what I would suggest implementing and ignoring whatever NumPy does :).

@charris charris added the triage review Issue/PR to be discussed at the next triage meeting label May 18, 2022
@mattip mattip modified the milestones: 1.23.0 release, 1.24.0 release May 18, 2022
@mattip mattip self-assigned this May 18, 2022
@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label Jun 15, 2022
@charris
Copy link
Member

charris commented Nov 19, 2022

Going to push this off to 1.25

seberg added a commit to seberg/numpy that referenced this issue May 17, 2023
Note that this doesn't mean that rounding is correct at least for
quantiles, so there is some dubious about it being a good idea to
use this.

But, it does fix the issue, and I the `copyto` solution seems rather
good to me, the only thing that isn't ideal is the `supports_nan`
definition itself.

Closes numpygh-20376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment