Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: longdouble(int) performs poorly due to an unnecessary base 10 transit #22098

Open
gpshead opened this issue Aug 8, 2022 · 5 comments
Open

Comments

@gpshead
Copy link
Contributor

gpshead commented Aug 8, 2022

Proposed new feature or change:

In #9968 code was added so that longdouble(int) would be more precise (good feature from a numpy perspective!) without transiting through a 64-bit double. But it chose a poorly performing algorithm when the long is large:

It requires transit through a base 10 string internally. https://github.com/numpy/numpy/blob/maintenance/1.23.x/numpy/core/src/common/npy_longdouble.c#L130 in npy_longdouble_from_PyLong. That performs in the sub-quadratic to quadratic range, so it is slow on huge numbers. Ironic given the goal of this API is to only care about at most the most significant 128-bits of the PyLong.

Instead of doing that, there are better options: If you still want the ease of going through a string, at the very least have it use hexadecimal instead. the equivalent of b"%x" % long_obj. That is an O(num_digits) conversion. But you can skip even that and use long_obj.to_bytes() to get a raw binary string from which you can even more directly determine the log2 for your exponent and directly grab the relevant mantissa value bits directly into your floating point representation.

You could probably be even more pedantic and tie yourself to the CPython PyLong internals of its 30 or 15 bit digits to avoid an intermediate linear from_binary or hex allocation at all, but that'd make for more fragile specific CPython PyLong implementation dependant code even if it would be constant time rather than O(n).

A hack to get constant time best of both worlds might be something logically akin to:

# ... mumble do something with the sign mumble ...
log2 = long_obj.bit_length()
if log2 > 128:
    long128 = long_obj >> (log2 - 128)
elif log2 < 128:
    long128 = long_obj << (128 - log2)
else:
   long128 = long_obj
assert long128.bit_length() == 128
bits128 = long128.to_bytes()
assert len(bits128) == 128//8
# construct your float from log2 and bits128
# (obviously don't need 128 bits as you've got an exponent to store)

That avoids doing an O(unbounded-N) to_bytes or hex conversion and only does it on the few bits you need. The shifts should be constant time as Python's internal PyLong bignum uses binary.

gpshead added a commit to gpshead/numpy that referenced this issue Aug 8, 2022
numpy#9968 was fixed. workaround no
longer appropriate.

Also inefficient once numpy#22098 is
implemented.
@eric-wieser
Copy link
Member

eric-wieser commented Aug 8, 2022

Indeed I noted that this was a poorly performing solution when I wrote it in #10723

/**
* TODO: currently a hack that converts the long through a string. This is
* correct, but slow.
*
* Another approach would be to do this numerically, in a similar way to
* PyLong_AsDouble.
* However, in order to respect rounding modes correctly, this needs to know
* the size of the mantissa, which is platform-dependent.
*/

I figured it was best to have correct behavior first, and wait for an interested party before expending the effort to fix the performance. Given it took 3 years for someone to notice, that feels like it was a reasonable call :).

@gpshead
Copy link
Contributor Author

gpshead commented Aug 8, 2022

Yeah my hand waivey comments are where I blindly assume numpy internals have a way to know the mantissa and exp sizes. :)

@gpshead
Copy link
Contributor Author

gpshead commented Aug 8, 2022

FWIW I would go with whatever is simplest and easy to maintain. O(N) vs constant isn't really important given that at least N has already been paid by the code that even has a huge int in the first place, so 2N doesn't change anything. Nobody should ever loop doing the same conversion on the same huge int. (sure, someone will... but that's their problem)

@gpshead
Copy link
Contributor Author

gpshead commented Sep 4, 2022

My work on testing python/cpython#95778 is the reason I noticed this issue. Some huge values will no longer be able to pass through base 10.

@rgommers
Copy link
Member

Some huge values will no longer be able to pass through base 10.

That sounds like an issue that we need to care about.

I'd suggest the performance is low-prio though, since we should get rid of longdouble completely at some point. It is no longer all that useful, just a poor alias to double. After all the dtype infrastructure work, it is now much more feasible to have a proper quad dtype, and there is some work going on in that direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants