New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid set_index(..., inplace=True)
where not necessary
#9472
Conversation
@@ -97,7 +97,7 @@ def test_orc_roundtrip(tmpdir, index, columns): | |||
} | |||
) | |||
if index: | |||
data.set_index(index, inplace=True) | |||
data = data.set_index(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that here, and in other tests, our use of inplace=True
isn't actually needed. All these pandas
DataFrames are small, so I've just removed the inplace
option altogether.
We'll be discussing this on Wendesday, Sept 14 during the pandas dev call to decide whether to revert this deprecation. |
Thanks for the heads up @mroeschke! Would you mind pinging me with the results of that decision after tomorrow's call? |
Even if we decide to not yet deprecate that, the above is certainly true, and I think it would be good to already clean up part of those. I think you can basically keep all the changes in test files in this PR. |
It was decided to revert the |
Great, thanks for up the update! I'll roll back the changes strictly related to the reverted deprecation |
set_index
pandas=1.5
compatibilityset_index(..., inplace=True)
where not necessary
@ncclementi @ian-r-rose is there a chance one of you could take a look at the changes here? Should be a relatively straightforward review with the reduced scope of this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, had only one comment for learning purposes.
@@ -49,6 +49,7 @@ def __call__(self, parts): | |||
) | |||
if self.index: | |||
_df.set_index(self.index, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an informative question, why do we keep the inplace=True
in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm keeping inplace=True
in non-testing code to try to minimize data copies. To be clear, I'm not 100% sure whether or not copies would still be avoided if inplace=False
(maybe @jorisvandenbossche or @mroeschke have more insight here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw this is definitely a fair question to ask. In general, I think avoiding inplace operations make things easier to reason about. Let's handle as needed in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at set_index
in pandas, a copy of the entire DataFrame is avoided if inplace=True
(Part of the motivation of the original deprecation is that a copy can always be avoid since the index is only being modified)
This PR handles the
inplace=
keyword being deprecated inpandas
set_index(...)
calls.Note, there is still discussion happening over in the
pandas
repo about whether or not to revert this deprecation (xref pandas-dev/pandas#48417). We should wait to merge this PR until a decision has been made there. cc @jorisvandenbossche for visibility