Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ensure text padding ufuncs handle stringdtype nan-like nulls #26353

Merged
merged 1 commit into from Apr 29, 2024

Conversation

ngoldbaum
Copy link
Member

Currently these ufuncs fail for arrays containing null strings with the following error:

Traceback (most recent call last):
  File "/Users/goldbaum/Documents/numpy/../numpy-experiments/test.py", line 5, in <module>
    print(np.strings.center(arr, 3))
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/goldbaum/Documents/numpy/build-install/usr/local/lib/python3.11/site-packages/numpy/_core/strings.py", line 629, in center
    width = np.maximum(str_len(a), width)
                       ^^^^^^^^^^
ValueError: The length of a null string is undefined

The use of str_len in np.strings is only necessary for fixed-width strings, so I moved the python width calculations in the np.strings wrappers after the check for stringdtype in each function and added width checking in the C++ ufunc loops.

I also modified the tests to register the text-padding ufuncs as passing through nan-like null strings.

@ngoldbaum ngoldbaum added the 09 - Backport-Candidate PRs tagged should be backported label Apr 26, 2024
@ngoldbaum
Copy link
Member Author

I initially marked this as a backport candidate but that was wrong, these ufuncs won't be in numpy 2.0.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we should have caught that the maximum width was really not necessary for StringDType. Anyway, solution looks all good!

Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ngoldbaum
Copy link
Member Author

Thanks for looking this over!

@ngoldbaum ngoldbaum merged commit 4f8c1e6 into numpy:main Apr 29, 2024
65 checks passed
seberg pushed a commit that referenced this pull request Apr 30, 2024
This fixes an issue similar to the one fixed by #26353.

In particular, right now np.strings.replace calls the count ufunc to get the number of replacements. This is necessary for fixed-width strings, but it turns out to make it impossible to support null strings in replace.

I went ahead and instead found the replacement counts inline in the ufunc loop. This lets me add support for nan-like null strings, which it turns out pandas needs.
charris pushed a commit to charris/numpy that referenced this pull request May 2, 2024
…6355)

This fixes an issue similar to the one fixed by numpy#26353.

In particular, right now np.strings.replace calls the count ufunc to get the number of replacements. This is necessary for fixed-width strings, but it turns out to make it impossible to support null strings in replace.

I went ahead and instead found the replacement counts inline in the ufunc loop. This lets me add support for nan-like null strings, which it turns out pandas needs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants