Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Reduction Pushdown #235

Open
wangandi opened this issue Feb 7, 2022 · 0 comments
Open

Optimization: Reduction Pushdown #235

wangandi opened this issue Feb 7, 2022 · 0 comments

Comments

@wangandi
Copy link
Owner

wangandi commented Feb 7, 2022

Elevator pitch

  1. What does the transformation do? Give a representative SQL query.

Convert a reduce on a join to a join on one or more reduced inputs.

  1. Why should we add it?

This can:

  • reduce the number of rows that the join produces.
  • eliminate skew.
  • reduce the number of diffs that the join operator receives.
  1. When would it be good to have?

When reducing an input of the join significantly decreases the number of rows that the input has.

  1. When would it be ineffectual?

When reducing an input of the join does not significantly decrease the number of rows.

  1. When would be bad to have?

When the join condition would filter out most of the rows coming from an input of the join.

  1. In the worst case, how would it degrade performance?

There would be unnecessary memory and CPU overhead.

List real life instances where this transformation would help.

No response

Cost Model

  1. What is the benefit of the transformation?
  2. What is the overhead?
  3. When would the transformation be worthwhile? Intuitively, this should
    be when benefit > overhead, but sometimes a benefit with regards to X
    comes at a cost with regards to Y, and it would be worthwhile to discuss
    when it is worthwhile to sacrifice Y to gain a benefit in X.

List any knobs that we may need to tune or expose to the user.

No response

Proposed implementation

  1. Describe the implementation.
  2. Which queries will do better with the given implementation?
  3. Which queries will do worse?
  4. Break the implementation down into stages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant