New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: ensure text padding ufuncs handle stringdtype nan-like nulls #26353
Conversation
32ca7eb
to
197c915
Compare
I initially marked this as a backport candidate but that was wrong, these ufuncs won't be in numpy 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess we should have caught that the maximum width was really not necessary for StringDType
. Anyway, solution looks all good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for looking this over! |
This fixes an issue similar to the one fixed by #26353. In particular, right now np.strings.replace calls the count ufunc to get the number of replacements. This is necessary for fixed-width strings, but it turns out to make it impossible to support null strings in replace. I went ahead and instead found the replacement counts inline in the ufunc loop. This lets me add support for nan-like null strings, which it turns out pandas needs.
…6355) This fixes an issue similar to the one fixed by numpy#26353. In particular, right now np.strings.replace calls the count ufunc to get the number of replacements. This is necessary for fixed-width strings, but it turns out to make it impossible to support null strings in replace. I went ahead and instead found the replacement counts inline in the ufunc loop. This lets me add support for nan-like null strings, which it turns out pandas needs.
Currently these ufuncs fail for arrays containing null strings with the following error:
The use of
str_len
innp.strings
is only necessary for fixed-width strings, so I moved the pythonwidth
calculations in thenp.strings
wrappers after the check for stringdtype in each function and added width checking in the C++ ufunc loops.I also modified the tests to register the text-padding ufuncs as passing through nan-like null strings.