Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Fix performance regression for isin with mismatching dtypes #49162

Merged
merged 2 commits into from Oct 18, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Oct 17, 2022

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

From the asv run you did. This restores 1.4.0 performance. The cast to object avoided the fast path we hit before
cc @jbrockmendel

Not a blocker for 1.5.1, will move whatsnew if it is released before

@phofl phofl added Regression Functionality that used to work in a prior pandas version Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 17, 2022
@phofl phofl added this to the 1.5.2 milestone Oct 17, 2022
@jbrockmendel
Copy link
Member

LGTM pending green

@phofl phofl added the Performance Memory or execution speed performance label Oct 17, 2022
@phofl phofl modified the milestones: 1.5.2, 1.5.1 Oct 18, 2022
@phofl phofl merged commit 4ba431f into pandas-dev:main Oct 18, 2022
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 18, 2022
phofl added a commit that referenced this pull request Oct 18, 2022
…for isin with mismatching dtypes) (#49165)

Backport PR #49162: PERF: Fix performance regression for isin with mismatching dtypes

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
@phofl phofl deleted the perf_isin branch October 20, 2022 08:46
@rhshadrach
Copy link
Member

rhshadrach commented Nov 21, 2022

@phofl - this may have induced a performance regression, please have a look and open an issue if necessary. This is a semi-automated message.

https://asv-runner.github.io/asv-collection/pandas/#algos.isin.IsIn.time_isin_empty?p-dtype='string%5Bpyarrow%5D'

new link

@phofl
Copy link
Member Author

phofl commented Nov 21, 2022

@rhshadrach the diagram is empty, is this expected?

@rhshadrach
Copy link
Member

Hmm, it seems GitHub does not like me ending a URL with a single quote. I've edited the comment above, and will be updating my script 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants