New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: support nan-like null strings in [l,r]strip #26392
Conversation
f9598cc
to
dcde9ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and can just put it in, since it just fixes a bug, comments are "would be nice to improve" really (not sure the OOM error path that looks wrong particularly important but happy to defer).
One other thing that struck me is: lstrip/rstrip should be pretty much identical except for calling a different core function.
That should really be consolidated either via C++ or via static data. But that is something to look in later I think.
else { | ||
npy_gil_error(PyExc_ValueError, | ||
"Cannot strip null values that are not strings"); | ||
"Can only strip null values that are strings " | ||
"or NaN-like values"); | ||
goto fail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One day might be nice to see if this check isn't common among a few ufuncs and just writing ufunc-name() cannot support ...
with a single helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's definitely some code duplication. I added this as a task to #25693.
@@ -1169,6 +1189,7 @@ string_lrstrip_whitespace_strided_loop( | |||
npy_static_string s = {0, NULL}; | |||
int s_isnull = NpyString_load(allocator, ps, &s); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't matter.
if (NpyString_pack(oallocator, ops, new_buf, new_buf_size) < 0) { | ||
npy_gil_error(PyExc_MemoryError, "Failed to pack string in %s", | ||
ufunc_name); | ||
goto fail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, missing the PyMem_RawFree(new_buf);
? I probably said that before, but it would seem nice to avoid reallocating a small buffer every time here (maybe directly allocating the result although that isn't possible if the operation is in-place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this error path indicates there's memory corruption or a bug in the implementation so a memory leak is the least of your worries if you get here. That said, easy enough to add a PyMem_RawFree
, thanks for the catch.
assert_array_equal(sizes, np.strings | ||
.str_len(self.a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_array_equal(sizes, np.strings | |
.str_len(self.a)) | |
assert_array_equal(sizes, np.strings.str_len(self.a)) |
Are you talking about the separation between If it's |
Ah, nvm that part... might be nice to do something anyway, but I somehow had mindslipped into reading the second copy to be rsplit and the first lsplit. |
Thanks for giving this a look Sebastian. I'm self-merging because this fixes some tests in pandas (see https://github.com/pandas-dev/pandas/actions/runs/8971939049/job/24638804051?pr=58578 with a bunch of failures related to this) and it would be nice to have this in the RC. |
Fixes #26390.
It turns out that
strip
has been missing from the stringdtype tests for unary and binary ufuncs and so the tests for null support were never triggered. I fixed that by adding all the strip-like functions to the tests.Note that this adds another case where we have a
goto next_step
and an extra block to avoid compliation errors about jumping over variable declarations. I'd prefer to go through all of these and rewrite the loops to use for loops instead ofwhile (N--)
, this just keeps things consistent with the rest of the file.