You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What does the transformation do? Give a representative SQL query.
Convert a reduce on a join to a join on one or more reduced inputs.
Why should we add it?
This can:
reduce the number of rows that the join produces.
eliminate skew.
reduce the number of diffs that the join operator receives.
When would it be good to have?
When reducing an input of the join significantly decreases the number of rows that the input has.
When would it be ineffectual?
When reducing an input of the join does not significantly decrease the number of rows.
When would be bad to have?
When the join condition would filter out most of the rows coming from an input of the join.
In the worst case, how would it degrade performance?
There would be unnecessary memory and CPU overhead.
List real life instances where this transformation would help.
No response
Cost Model
What is the benefit of the transformation?
What is the overhead?
When would the transformation be worthwhile? Intuitively, this should
be when benefit > overhead, but sometimes a benefit with regards to X
comes at a cost with regards to Y, and it would be worthwhile to discuss
when it is worthwhile to sacrifice Y to gain a benefit in X.
List any knobs that we may need to tune or expose to the user.
No response
Proposed implementation
Describe the implementation.
Which queries will do better with the given implementation?
Which queries will do worse?
Break the implementation down into stages.
The text was updated successfully, but these errors were encountered:
Elevator pitch
Convert a reduce on a join to a join on one or more reduced inputs.
This can:
When reducing an input of the join significantly decreases the number of rows that the input has.
When reducing an input of the join does not significantly decrease the number of rows.
When the join condition would filter out most of the rows coming from an input of the join.
There would be unnecessary memory and CPU overhead.
List real life instances where this transformation would help.
No response
Cost Model
be when benefit > overhead, but sometimes a benefit with regards to X
comes at a cost with regards to Y, and it would be worthwhile to discuss
when it is worthwhile to sacrifice Y to gain a benefit in X.
List any knobs that we may need to tune or expose to the user.
No response
Proposed implementation
The text was updated successfully, but these errors were encountered: