ENH: longdouble(int) performs poorly due to an unnecessary base 10 transit #22098

gpshead · 2022-08-08T21:17:14Z

Proposed new feature or change:

In #9968 code was added so that longdouble(int) would be more precise (good feature from a numpy perspective!) without transiting through a 64-bit double. But it chose a poorly performing algorithm when the long is large:

It requires transit through a base 10 string internally. https://github.com/numpy/numpy/blob/maintenance/1.23.x/numpy/core/src/common/npy_longdouble.c#L130 in npy_longdouble_from_PyLong. That performs in the sub-quadratic to quadratic range, so it is slow on huge numbers. Ironic given the goal of this API is to only care about at most the most significant 128-bits of the PyLong.

Instead of doing that, there are better options: If you still want the ease of going through a string, at the very least have it use hexadecimal instead. the equivalent of b"%x" % long_obj. That is an O(num_digits) conversion. But you can skip even that and use long_obj.to_bytes() to get a raw binary string from which you can even more directly determine the log2 for your exponent and directly grab the relevant mantissa value bits directly into your floating point representation.

You could probably be even more pedantic and tie yourself to the CPython PyLong internals of its 30 or 15 bit digits to avoid an intermediate linear from_binary or hex allocation at all, but that'd make for more fragile specific CPython PyLong implementation dependant code even if it would be constant time rather than O(n).

A hack to get constant time best of both worlds might be something logically akin to:

# ... mumble do something with the sign mumble ...
log2 = long_obj.bit_length()
if log2 > 128:
    long128 = long_obj >> (log2 - 128)
elif log2 < 128:
    long128 = long_obj << (128 - log2)
else:
   long128 = long_obj
assert long128.bit_length() == 128
bits128 = long128.to_bytes()
assert len(bits128) == 128//8
# construct your float from log2 and bits128
# (obviously don't need 128 bits as you've got an exponent to store)

That avoids doing an O(unbounded-N) to_bytes or hex conversion and only does it on the few bits you need. The shifts should be constant time as Python's internal PyLong bignum uses binary.

The text was updated successfully, but these errors were encountered:

numpy#9968 was fixed. workaround no longer appropriate. Also inefficient once numpy#22098 is implemented.

eric-wieser · 2022-08-08T21:48:49Z

Indeed I noted that this was a poorly performing solution when I wrote it in #10723

numpy/numpy/core/src/common/npy_longdouble.c

Lines 114 to 122 in 72b98a0

    
           /** 
        
            * TODO: currently a hack that converts the long through a string. This is 
        
            * correct, but slow. 
        
            * 
        
            * Another approach would be to do this numerically, in a similar way to 
        
            * PyLong_AsDouble. 
        
            * However, in order to respect rounding modes correctly, this needs to know 
        
            * the size of the mantissa, which is platform-dependent. 
        
            */

I figured it was best to have correct behavior first, and wait for an interested party before expending the effort to fix the performance. Given it took 3 years for someone to notice, that feels like it was a reasonable call :).

gpshead · 2022-08-08T21:57:36Z

Yeah my hand waivey comments are where I blindly assume numpy internals have a way to know the mantissa and exp sizes. :)

gpshead · 2022-08-08T22:01:05Z

FWIW I would go with whatever is simplest and easy to maintain. O(N) vs constant isn't really important given that at least N has already been paid by the code that even has a huge int in the first place, so 2N doesn't change anything. Nobody should ever loop doing the same conversion on the same huge int. (sure, someone will... but that's their problem)

gpshead · 2022-09-04T17:26:33Z

My work on testing python/cpython#95778 is the reason I noticed this issue. Some huge values will no longer be able to pass through base 10.

rgommers · 2022-09-13T05:10:46Z

Some huge values will no longer be able to pass through base 10.

That sounds like an issue that we need to care about.

I'd suggest the performance is low-prio though, since we should get rid of longdouble completely at some point. It is no longer all that useful, just a poor alias to double. After all the dtype infrastructure work, it is now much more feasible to have a proper quad dtype, and there is some work going on in that direction.

gpshead added a commit to gpshead/numpy that referenced this issue Aug 8, 2022

Remove workaround for numpygh-9968 from test_roundtrip.

79410aa

numpy#9968 was fixed. workaround no longer appropriate. Also inefficient once numpy#22098 is implemented.

gpshead mentioned this issue Aug 8, 2022

TST: Remove workaround for gh-9968 from test_scalar_methods.test_roundtrip #22099

Merged

rgommers added the component: numpy._core label Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: longdouble(int) performs poorly due to an unnecessary base 10 transit #22098

ENH: longdouble(int) performs poorly due to an unnecessary base 10 transit #22098

gpshead commented Aug 8, 2022

eric-wieser commented Aug 8, 2022 •

edited

gpshead commented Aug 8, 2022

gpshead commented Aug 8, 2022

gpshead commented Sep 4, 2022

rgommers commented Sep 13, 2022

ENH: longdouble(int) performs poorly due to an unnecessary base 10 transit #22098

ENH: longdouble(int) performs poorly due to an unnecessary base 10 transit #22098

Comments

gpshead commented Aug 8, 2022

Proposed new feature or change:

eric-wieser commented Aug 8, 2022 • edited

gpshead commented Aug 8, 2022

gpshead commented Aug 8, 2022

gpshead commented Sep 4, 2022

rgommers commented Sep 13, 2022

eric-wieser commented Aug 8, 2022 •

edited