Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(scale): introduce deterministic scaling tests #5657

Merged
merged 25 commits into from Oct 8, 2022
Merged

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Sep 30, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

As explained in #5655.

The nexmark_q4.rs shows how to manually write scaling cases, and tests issue #5523. I'm planning to test more queries with random plans after resolving some blockers:

  • scale: cache invalidation when scaling #5567
  • Some columns are missing in nexmark source, so not all queries can run.
  • The results of some queries are generated slowly or updated infrequently, which might not be suitable for tests. Need to control the timing carefully.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@github-actions github-actions bot added the component/test Test related issue. label Sep 30, 2022
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Merging #5657 (1870972) into main (bd3bd59) will decrease coverage by 0.01%.
The diff coverage is 54.22%.

@@            Coverage Diff             @@
##             main    #5657      +/-   ##
==========================================
- Coverage   74.30%   74.29%   -0.02%     
==========================================
  Files         924      924              
  Lines      144308   144263      -45     
==========================================
- Hits       107225   107175      -50     
- Misses      37083    37088       +5     
Flag Coverage Δ
rust 74.29% <54.22%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/catalog/column.rs 77.17% <0.00%> (-11.03%) ⬇️
src/common/src/error.rs 72.30% <ø> (ø)
src/common/src/hash/key.rs 84.54% <ø> (ø)
...c/compute/src/compute_observer/observer_manager.rs 64.70% <ø> (ø)
src/compute/src/server.rs 0.00% <0.00%> (ø)
src/connector/src/lib.rs 100.00% <ø> (ø)
src/connector/src/source/dummy_connector.rs 0.00% <0.00%> (ø)
.../src/source/filesystem/s3/source/s3_file_reader.rs 0.00% <0.00%> (ø)
...rc/connector/src/source/kafka/enumerator/client.rs 0.00% <0.00%> (ø)
src/connector/src/source/kafka/source/reader.rs 0.00% <0.00%> (ø)
... and 132 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Copy link
Member Author

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--

Copy link
Contributor

@wangrunji0408 wangrunji0408 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -30,6 +30,7 @@ members = [
"src/test_runner",
"src/tests/regress",
"src/tests/simulation",
"src/tests/simulation_scale",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer merging it into the existing simulation crate, maybe in a future PR.

Copy link
Member Author

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/meta/src/storage/mem_meta_store.rs Show resolved Hide resolved
category = "RiseDev - Archive simulation scaling tests"
description = "Archive integration scaling tests in deterministic simulation mode"
dependencies = ["warn-on-missing-tools"]
env = { RUSTFLAGS = "-Ctarget-cpu=native --cfg tokio_unstable --cfg madsim", RUSTDOCFLAGS = "--cfg madsim", CARGO_TARGET_DIR = "target/sim" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't decide how to enable SIMD this way through environments. Thus, the results of JSON parsing might be different due to precision errors like #5487. 🥵

Comment on lines 48 to 49
if chunk.is_empty() {
yield pending().await;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be CPU intensive if there are no remaining records to generate, which is problematic with madsim. 🤣

Copy link
Contributor

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSTM!!!

src/tests/simulation_scale/src/cluster.rs Outdated Show resolved Hide resolved
src/tests/simulation_scale/src/cluster.rs Outdated Show resolved Hide resolved
src/tests/simulation_scale/src/cluster.rs Outdated Show resolved Hide resolved
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
// The connector assumes that the stream will never end, so if the `event_num` is hit, we
// pend the stream forever.
// TODO: should we allow the stream to finish?
let () = pending().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the stream generator themselves should always end the stream gracefully, as long as they are not infinite streams. Even for those down-streams, they should end themselves as well when the upstream is closed. This way the errors can be propagated to the sink. The system won't be blocking implicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this issue and decided not to use the Stream terminate or TryStream error to represent the stream state in RisingWave as we don't want to propagate the error through the network. 🥵 The error yielded in actor internally should be collected by the Actor instance and find a way to report it to meta service if possible. cc @fuyufjh

For the stream reader, I find that the refactor just merged has removed the assumption (as we're all-in async stream), so maybe the workaround can be removed. Note that there's a select of source reader and barrier receiver, the source executor will work correctly after the source is gracefully terminated. On error, it will propagate to the actor of this source executor.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao
Copy link
Member Author

BugenZhao commented Oct 8, 2022

After allowing the source reader part to terminate, the select_with_strategy will terminate unexpectedly (which causes CI to fail). I believe this is a bug of futures. 🤯
rust-lang/futures-rs#2635

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao added the mergify/can-merge Indicates that the PR can be added to the merge queue label Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/test Test related issue. mergify/can-merge Indicates that the PR can be added to the merge queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce a framework with utilities to allow writing scaling test cases manually.
3 participants