Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARN: Remove false positive warning for iloc inplaceness #48397

Merged
merged 22 commits into from Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion pandas/_testing/__init__.py
Expand Up @@ -459,7 +459,8 @@ def all_timeseries_index_generator(k: int = 10) -> Iterable[Index]:
def make_rand_series(name=None, dtype=np.float64) -> Series:
index = makeStringIndex(_N)
data = np.random.randn(_N)
data = data.astype(dtype, copy=False)
with np.errstate(invalid="ignore"):
data = data.astype(dtype, copy=False)
return Series(data, index=index, name=name)


Expand Down
5 changes: 4 additions & 1 deletion pandas/core/dtypes/cast.py
Expand Up @@ -1970,7 +1970,10 @@ def np_can_hold_element(dtype: np.dtype, element: Any) -> Any:
if tipo.kind not in ["i", "u"]:
if isinstance(element, np.ndarray) and element.dtype.kind == "f":
# If all can be losslessly cast to integers, then we can hold them
casted = element.astype(dtype)
with np.errstate(invalid="ignore"):
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
# We check afterwards if cast was losslessly, so no need to show
# the warning
casted = element.astype(dtype)
comp = casted == element
if comp.all():
# Return the casted values bc they can be passed to
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexing.py
Expand Up @@ -2016,6 +2016,10 @@ def _setitem_single_column(self, loc: int, value, plane_indexer):
and (
np.shares_memory(new_values, orig_values)
or new_values.shape != orig_values.shape
or (
not can_hold_element(orig_values, np.nan)
and isna(new_values).any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we get here with warn=True?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iloc._setitem_with_indexer(indexer, value, self.name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so i think what's happening here is that we are checking can_hold_element too soon, bc reindexing occurs within self.obj._iset_item(loc, value). I think it would be better to do the reindex before the can_hold_element check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a suggestion. Not sure what is better performance wise. Checking initially and the rechecking? Is _get_column_array expensive or cheap? If it is cheap, it might be better only checking can_hold_element for new values?

This breaks one test, not sure if you can set an all NaT Series into an underlying Series with timezone UTC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_column_array is pretty cheap. its the isna(...).any() that i think may be expensive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks one test, not sure if you can set an all NaT Series into an underlying Series with timezone UTC?

depends on the dtype of the all-NaT Series

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original values are datetime64[ns, UTC] and all NaT Series is datetime64[ns]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should definitely not be inplace

Copy link
Member Author

@phofl phofl Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, we fixed the test instead of breaking it :)

Anything against checking only new_values then? See latest commit now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that makes sense. caffeine is still kicking in though

)
)
):
# TODO: get something like tm.shares_memory working?
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/frame/indexing/test_indexing.py
Expand Up @@ -1326,7 +1326,8 @@ def test_loc_expand_empty_frame_keep_midx_names(self):
def test_loc_setitem_rhs_frame(self, idxr, val):
# GH#47578
df = DataFrame({"a": [1, 2]})
df.loc[:, "a"] = DataFrame({"a": [val, 11]}, index=[1, 2])
with tm.assert_produces_warning(None):
df.loc[:, idxr] = DataFrame({"a": [val, 11]}, index=[1, 2])
expected = DataFrame({"a": [np.nan, val]})
tm.assert_frame_equal(df, expected)

Expand Down