Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occupancy improvement for Hash table build #15700

Draft
wants to merge 12 commits into
base: branch-24.06
Choose a base branch
from

Conversation

tgujar
Copy link
Contributor

@tgujar tgujar commented May 8, 2024

Description

Prototype implementation for: #15502

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented May 8, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 8, 2024
@tgujar
Copy link
Contributor Author

tgujar commented May 8, 2024

I think the approach of specializing the type dispatcher is very cumbersome and will lead to a lot of code replication. Currently, I have the conditional dispatch working for device_row_hasher but I am unsure if there is a better way to implement this. We could introduce a macro here to generate the code, what do you think?

@PointKernel PointKernel added non-breaking Non-breaking change 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function Performance Performance related issue labels May 8, 2024
@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

PointKernel commented May 14, 2024

@tgujar I've updated the docs to unblock CI. Have you noticed any performance regressions for other use cases? It seems that it improves the performance for mixed join but the performance drops significantly in other cases using row hasher.

Comment on lines +48 to +50
id_to_type<type_id::DECIMAL128>,
id_to_type<type_id::DECIMAL64>,
id_to_type<type_id::DECIMAL32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think decimal types are complex type. They are just a wrapper around some integer type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equality operator for Decimal will perform scaling which uses exponentiation.

CUDF_HOST_DEVICE inline bool operator==(fixed_point<Rep1, Rad1> const& lhs,

I see a reduction in register usage if I comment out decimal types in #15502. I think we can still decide on the types excluded in the branches later on

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

@tgujar Could you take a look at the failing tests?

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

3 participants