Avoid explicit consolidation in topk rendering #27068

antiguru · 2024-05-13T20:54:22Z

Motivation

This PR adds a feature that has not yet been specified.

Change the rendering of topk plans to avoid an intermediate consolidate. At the moment, we render plans by forking the inputs, arranging and reducing once side, then concatenating the inputs with negated reduction output, and consolidating the result. This makes sure that we consolidate eagerly, but at the same time does duplicate work: The next operator forms an arrangement, so we could just reuse that instead.

Ths PR implements this pattern, removing one consolidate from each topk stage, and adding it back after the final stage to ensure the topk output itself is consolidated. Note that we now apply the hash modulus on uncompacted data, whereas it previously was guaranteed to be consolidated. This might increase the cost of the operator by a factor of 2.

Tops to the reviewer

Best viewed with whitespace changes hidden!

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

frankmcsherry

I think I understand this. Going forward I submit that more comments would help. This is not new to this PR, and is inherited from before, but it was hard to understand what has happened when no comments exist to either state what needs to be true or to change as we change the implementation.

frankmcsherry · 2024-05-14T13:00:35Z

src/compute/src/render/top_k.rs

+        let (input, oks, errs) = if validating {
+            let from = |v: &Result<Row, Row>| v.into_owned();
+            let (input, stage) =
+                build_topk_negated_stage::<S, _, _, RowValSpine<Result<Row, Row>, _, _>>(
+                    &input, from, order_key, offset, limit, arity,
+                );
+            let stage = stage.as_collection(|k, v| (SharedRow::pack(k), v.clone()));


Is it right that the from movement is tidying, and the only thing going on here is returning the other result returned from build_topk_negated_stage?

Yes, this is to avoid the horrors of cargo fmt, which would spread it over approximately 150 lines otherwise.

frankmcsherry

Looks good. Only thought was around the added consolidation and that perhaps it is optional, but also perhaps we should shake that out at a later date.

frankmcsherry · 2024-05-14T17:47:07Z

src/compute/src/render/top_k.rs

+                    // Consolidate the output of `build_topk_stage` because it's not guaranteed to be.
+                    let result = result.consolidate_named::<KeyBatcher<_, _, _>>(
+                        "Monotonic TopK final consolidate",
+                    );


Fwiw, I think we should revisit "consolidation" and come up with a consistent pattern for introducing it. For example, I'm a supporter of "before re-using a collection" to avoid consolidation that may then feed into an arrangement. I'm not sure we need to perform it before emitting results here, though. Seems harmless, as the net reduction in consolidations is already good in the PR, but .. would love to revisit.

The reason I added the defensive consolidate is that I don't know what expectations downstream operators have about the form of the data. Ideally, all operators should function with non-consolidated data, but specifically monotonic implementations do not handle non-consolidated data well. (Should we have a different Diff for monotonic dataflows?)

Monotonic operators have a must_consolidate flag, which informs them whether the input is consolidated.

This is tuned by RelaxMustConsolidate. This does abstract interpretation on the LIR trees, keeping track of whether there was an operation that changed the consolidatedness. You can control its behavior for TopK here.

Thank you, so the implementation is correct wrt to the physically monotonic interpreter. This is somewhat fragile because reasoning about whether a certain operator is monotonic or not is not simple...

Yeah, this interpreter has to be kept up-to-date whenever any operator implementation changes. This might indeed be a little error-prone.

frankmcsherry · 2024-05-14T17:47:35Z

src/compute/src/render/top_k.rs

+        // Consolidate the output of `build_topk_stage` because it's not guaranteed to be.
+        let oks = oks.consolidate_named::<KeyBatcher<_, _, _>>("TopK final consolidate");


Same as above.

src/compute/src/render/top_k.rs

We can avoid the explicit consolidation in topk rendering by reusing the arrangement created in front of the reduction. Signed-off-by: Moritz Hoffmann <mh@materialize.com>

Signed-off-by: Moritz Hoffmann <mh@materialize.com>

antiguru · 2024-05-14T18:46:32Z

A nightly run does not indicate any regressions: https://buildkite.com/materialize/nightly/builds/7769

frankmcsherry approved these changes May 14, 2024

View reviewed changes

antiguru force-pushed the topk_no_consolidate branch from 4258939 to d7e5342 Compare May 14, 2024 15:24

antiguru marked this pull request as ready for review May 14, 2024 15:30

antiguru requested a review from a team as a code owner May 14, 2024 15:30

antiguru mentioned this pull request May 14, 2024

Avoid consolidate in minsmaxes hierarchical #27085

Merged

5 tasks

frankmcsherry approved these changes May 14, 2024

View reviewed changes

antiguru added 3 commits May 14, 2024 14:44

Avoid explicit consolidation in topk rendering

df8304b

We can avoid the explicit consolidation in topk rendering by reusing the arrangement created in front of the reduction. Signed-off-by: Moritz Hoffmann <mh@materialize.com>

Update documentation, add consolidation after last stage

6be274a

Signed-off-by: Moritz Hoffmann <mh@materialize.com>

Correct spelling

d9375a9

Signed-off-by: Moritz Hoffmann <mh@materialize.com>

antiguru force-pushed the topk_no_consolidate branch from d7e5342 to d9375a9 Compare May 14, 2024 18:44

antiguru enabled auto-merge May 14, 2024 18:46

antiguru merged commit ff0d8eb into MaterializeInc:main May 14, 2024
72 of 73 checks passed

antiguru deleted the topk_no_consolidate branch May 14, 2024 20:02

materialize-bot mentioned this pull request May 16, 2024

release: v0.100.0 required reviews #27141

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid explicit consolidation in topk rendering #27068

Avoid explicit consolidation in topk rendering #27068

antiguru commented May 13, 2024 •

edited

frankmcsherry left a comment

frankmcsherry May 14, 2024

antiguru May 14, 2024

frankmcsherry left a comment

frankmcsherry May 14, 2024

antiguru May 14, 2024

ggevay May 15, 2024 •

edited

antiguru May 15, 2024

ggevay May 15, 2024

frankmcsherry May 14, 2024

antiguru commented May 14, 2024

		// Consolidate the output of `build_topk_stage` because it's not guaranteed to be.
		let oks = oks.consolidate_named::<KeyBatcher<_, _, _>>("TopK final consolidate");

Avoid explicit consolidation in topk rendering #27068

Avoid explicit consolidation in topk rendering #27068

Conversation

antiguru commented May 13, 2024 • edited

Motivation

Tops to the reviewer

Checklist

frankmcsherry left a comment

Choose a reason for hiding this comment

frankmcsherry May 14, 2024

Choose a reason for hiding this comment

antiguru May 14, 2024

Choose a reason for hiding this comment

frankmcsherry left a comment

Choose a reason for hiding this comment

frankmcsherry May 14, 2024

Choose a reason for hiding this comment

antiguru May 14, 2024

Choose a reason for hiding this comment

ggevay May 15, 2024 • edited

Choose a reason for hiding this comment

antiguru May 15, 2024

Choose a reason for hiding this comment

ggevay May 15, 2024

Choose a reason for hiding this comment

frankmcsherry May 14, 2024

Choose a reason for hiding this comment

antiguru commented May 14, 2024

antiguru commented May 13, 2024 •

edited

ggevay May 15, 2024 •

edited