BUG: Fix timedelta*float and median/percentile/quantile NaT handling #21726

seberg · 2022-06-10T17:58:32Z

Currently has three changesets in there:

It implements correct rounding and NaT support in timedelta64 * double.
Median does not currently support Datetimes. This is because
mean does not support datetimes.
The reason for mean not supporting datetimes, is that it is
(understandibly) written as (a + b)/2, but datetimes cannot
be added so that we would have to use a different scheme, such as
a + (b - a) / 2.

However: Median does support timedelta, and this fixes NaT support there.
Unlike Median, the interpolation formula used by quantile/percentile
does support Datetimes just fine, so this also adds basic tests for
that.
I now suspect the rounding is correct here, with the first changes applies.

@jbrockmendel, this is pretty much ready I suspect, but I am honestly not sure about the changes, so half the reason was to ping someone with more time experience.
There would be two things to note:

The timedelta64 * float loop is of significantly slower with the added checks and correct rounding. (Not sure if this matters, but wanted to note it)
The timedelta64 * float loop changes need a few tests for rounding mode, NaT creation and floating point error reporting.

I am happy to split this up My initial intention was to close gh-20376, gh-11627, and gh-11620. The added fix for the timedelta loops, fixes quantile for percentile a bit. But should likely be pulled out to not block the rest.

There should be better ways to do those NaN or NaT branches in median/quantile, but....

EDIT: I am not sure I will prioritize this without prodding, so if it stalls, don't hesitate to close.

Median does not currently support Datetimes. This is because `mean` does not support datetimes. The reason for `mean` not supporting datetimes, is that it is (understandibly) written as `(a + b)/2`, but datetimes cannot be added so that we would have to use a different scheme, such as `a + (b - a) / 2`.

Unlike Median, the interpolation formula used by quantile/percentile does support Datetimes just fine, so this also adds basic tests for that. Due to the integral nature, times suffer round-off errors that seem not ideal. This is an issue, but distinct from the NaT one?

The test was fixed by numpygh-19857

This fixes both rounding to be correct and that NaT is propagated correctly (and used when any overflow occurs). Unfortunately, this makes the loop a whole lot slower...

seberg · 2022-06-11T02:00:42Z

numpy/core/src/umath/loops.c.src

-            }
-            else {
+            /* `nearbyint` avoids warnings (should not matter here, though) */
+            double result = nearbyint(in1 * in2);


I am not sure about this, also in that: Is this actually enough?

The input is effectively int64, so by casting it to double we throw away precision, this means that:

In [6]: np.timedelta64(2**58 + 7, "us") - np.timedelta64(2**58 + 7, "us") * 1. Out[6]: numpy.timedelta64(7,'us')

woops. long double could fix this when available (extended precision has 64bit mantissa), but that doesn't always exist...

So, maybe the whole expectation of this rounding correct is too much, or maybe we should consider the above a serious bug?

So, maybe the whole expectation of this rounding correct is too much, or maybe we should consider the above a serious bug?

I think this should be considered a bug if and only if the analogous behavior in int64 is considered a bug:

td64 = np.timedelta64(2**58 + 7, "us") val = td64.view("i8") >>> val - int(val*1.0) 7

Yes, it is the same thing. But int() always truncates too (i.e. no correct rounding). And you call int() explicitly, here we call it internally hidden to the user and thus effectively breaking the time precision for very large values.

But yeah, maybe it is just what it is, as Chuck keeps saying, maybe we really need a high-precision floating point time format (double-double, or whatever)...

jbrockmendel · 2022-06-12T03:18:10Z

numpy/core/src/umath/loops.c.src

-            }
-            else {
+            /* `nearbyint` avoids warnings (should not matter here, though) */
+            double result = nearbyint(in1 * in2);


is nearbyint similar to python round?

roughly speaking, yes. Of course it will round C-style. It is really the same thing as np.rint, except that it should give a warning (which should not really matter. If it sets a warning it should be the one we set anyway, but...)

jbrockmendel · 2022-06-12T03:26:36Z

numpy/lib/function_base.py

-    if np.issubdtype(a.dtype, np.inexact):
+
+    # Check if the array contains any nan's or NaT's (unordered values)
+    supports_nans = np.issubdtype(a.dtype, np.inexact) or a.dtype.kind == 'm'


if you want to support dt64 here, something like:

if a.dtype.kind == "M": td_res = _median(a.view("m8"), ...) return td_res.view(a.dtype)

?

Yes, we could. I won't make it a priority though (i.e. we can follow up if we want these changes and allow median). We could also replace the call to mean() with a calculation that does: a + (b-a) / 2 (or *0.5) for example.

seberg · 2024-02-07T18:33:42Z

Might be nice to get back to but diverged a lot anyway, so closing for now.

github-actions bot added the 00 - Bug label Jun 10, 2022

seberg added 3 commits June 10, 2022 12:26

TST: Remove XFAIL from boolean percentile test

7610433

The test was fixed by numpygh-19857

BUG: Fix timedelta64 * float rounding and NaT propagation

30c21ca

This fixes both rounding to be correct and that NaT is propagated correctly (and used when any overflow occurs). Unfortunately, this makes the loop a whole lot slower...

seberg force-pushed the nat-percentile-median branch from 32e3562 to 30c21ca Compare June 10, 2022 19:27

seberg commented Jun 11, 2022

View reviewed changes

jbrockmendel reviewed Jun 12, 2022

View reviewed changes

seberg closed this Feb 7, 2024

seberg added the 64 - Good Idea Inactive PR with a good start or idea. Consider studying it if you are working on a related issue. label Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix timedelta*float and median/percentile/quantile NaT handling #21726

BUG: Fix timedelta*float and median/percentile/quantile NaT handling #21726

seberg commented Jun 10, 2022 •

edited

seberg Jun 11, 2022

jbrockmendel Jun 12, 2022

seberg Jun 12, 2022

jbrockmendel Jun 12, 2022

seberg Jun 12, 2022

jbrockmendel Jun 12, 2022

seberg Jun 12, 2022

seberg commented Feb 7, 2024

BUG: Fix timedelta*float and median/percentile/quantile NaT handling #21726

BUG: Fix timedelta*float and median/percentile/quantile NaT handling #21726

Conversation

seberg commented Jun 10, 2022 • edited

seberg Jun 11, 2022

Choose a reason for hiding this comment

jbrockmendel Jun 12, 2022

Choose a reason for hiding this comment

seberg Jun 12, 2022

Choose a reason for hiding this comment

jbrockmendel Jun 12, 2022

Choose a reason for hiding this comment

seberg Jun 12, 2022

Choose a reason for hiding this comment

jbrockmendel Jun 12, 2022

Choose a reason for hiding this comment

seberg Jun 12, 2022

Choose a reason for hiding this comment

seberg commented Feb 7, 2024

seberg commented Jun 10, 2022 •

edited