Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle use pyarrow more broadly #8596

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Mar 20, 2024

These are a couple of very minor fixes that in my very limited small scale testing turned out to speed up things.

I can speak at least to the unique computation that I saw this flare up in profiles as well and benchmarking this on toy examples shows that this is about 20x faster than on main but that depends of course on the kind of data so in general this is likely not as impactful

Copy link
Contributor

github-actions bot commented Mar 20, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    29 files  ± 0      29 suites  ±0   11h 19m 23s ⏱️ + 7m 11s
 4 055 tests ± 0   3 689 ✅  - 246    109 💤 ±0  256 ❌ +245  1 🔥 +1 
54 889 runs  +19  52 142 ✅  - 230  2 410 💤  - 5  336 ❌ +253  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 9826d68. ± Comparison against base commit 8927bfd.

♻️ This comment has been updated with latest results.


partitions = t.select([column]).to_pandas()[column].unique()
partitions.sort()
partitions = np.array(pa.compute.unique(t[column]).sort())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the remainder of the np conversion as well by leveraging pa.compute.(not)_equal and pa.compute.indices_nonzero?

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fjetter!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants