Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sort_values with Timestamp data #9642

Merged
merged 1 commit into from Nov 10, 2022

Conversation

jrbourbeau
Copy link
Member

@jrbourbeau jrbourbeau commented Nov 9, 2022

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrbourbeau !

It's interesting that the motivating bug is caused by the use of a single-column DataFrame instead of a Series. Perhaps we should raise an error in set_partitions_pre when s is not series-like?

@jrbourbeau
Copy link
Member Author

Thanks for the review @rjzamora. Running the test suite with a DataFrame-like check added locally, it doesn't look like there are other time where s is a DataFrame (at least as measured by our existing test coverage). FWIW I think the reason we didn't catch this earlier is because pandas has inconsistent behavior when handling DataFrames as input to searchsorted -- I've opened an upstream issue here pandas-dev/pandas#49620. We certainly can add a Series check here, but it might be overkill

@jrbourbeau jrbourbeau merged commit bfe07f6 into dask:main Nov 10, 2022
@jrbourbeau jrbourbeau deleted the sort_values_timestamp branch November 10, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants