-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid explicit consolidation in topk rendering #27068
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand this. Going forward I submit that more comments would help. This is not new to this PR, and is inherited from before, but it was hard to understand what has happened when no comments exist to either state what needs to be true or to change as we change the implementation.
src/compute/src/render/top_k.rs
Outdated
let (input, oks, errs) = if validating { | ||
let from = |v: &Result<Row, Row>| v.into_owned(); | ||
let (input, stage) = | ||
build_topk_negated_stage::<S, _, _, RowValSpine<Result<Row, Row>, _, _>>( | ||
&input, from, order_key, offset, limit, arity, | ||
); | ||
let stage = stage.as_collection(|k, v| (SharedRow::pack(k), v.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it right that the from
movement is tidying, and the only thing going on here is returning the other result returned from build_topk_negated_stage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is to avoid the horrors of cargo fmt
, which would spread it over approximately 150 lines otherwise.
4258939
to
d7e5342
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Only thought was around the added consolidation and that perhaps it is optional, but also perhaps we should shake that out at a later date.
// Consolidate the output of `build_topk_stage` because it's not guaranteed to be. | ||
let result = result.consolidate_named::<KeyBatcher<_, _, _>>( | ||
"Monotonic TopK final consolidate", | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, I think we should revisit "consolidation" and come up with a consistent pattern for introducing it. For example, I'm a supporter of "before re-using a collection" to avoid consolidation that may then feed into an arrangement. I'm not sure we need to perform it before emitting results here, though. Seems harmless, as the net reduction in consolidations is already good in the PR, but .. would love to revisit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added the defensive consolidate is that I don't know what expectations downstream operators have about the form of the data. Ideally, all operators should function with non-consolidated data, but specifically monotonic implementations do not handle non-consolidated data well. (Should we have a different Diff
for monotonic dataflows?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monotonic operators have a must_consolidate
flag, which informs them whether the input is consolidated.
This is tuned by RelaxMustConsolidate
. This does abstract interpretation on the LIR trees, keeping track of whether there was an operation that changed the consolidatedness. You can control its behavior for TopK here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, so the implementation is correct wrt to the physically monotonic interpreter. This is somewhat fragile because reasoning about whether a certain operator is monotonic or not is not simple...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this interpreter has to be kept up-to-date whenever any operator implementation changes. This might indeed be a little error-prone.
// Consolidate the output of `build_topk_stage` because it's not guaranteed to be. | ||
let oks = oks.consolidate_named::<KeyBatcher<_, _, _>>("TopK final consolidate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
We can avoid the explicit consolidation in topk rendering by reusing the arrangement created in front of the reduction. Signed-off-by: Moritz Hoffmann <mh@materialize.com>
Signed-off-by: Moritz Hoffmann <mh@materialize.com>
Signed-off-by: Moritz Hoffmann <mh@materialize.com>
d7e5342
to
d9375a9
Compare
A nightly run does not indicate any regressions: https://buildkite.com/materialize/nightly/builds/7769 |
Motivation
This PR adds a feature that has not yet been specified.
Change the rendering of topk plans to avoid an intermediate consolidate. At the moment, we render plans by forking the inputs, arranging and reducing once side, then concatenating the inputs with negated reduction output, and consolidating the result. This makes sure that we consolidate eagerly, but at the same time does duplicate work: The next operator forms an arrangement, so we could just reuse that instead.
Ths PR implements this pattern, removing one consolidate from each topk stage, and adding it back after the final stage to ensure the topk output itself is consolidated. Note that we now apply the hash modulus on uncompacted data, whereas it previously was guaranteed to be consolidated. This might increase the cost of the operator by a factor of 2.
Tops to the reviewer
Best viewed with whitespace changes hidden!
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.