Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(udf): move UDF implementations to expr_impl #16759

Merged
merged 13 commits into from
May 20, 2024

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented May 15, 2024

Signed-off-by: Runji Wang wangrunji0408@163.comI hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR moves UDF implementations from expr_core to expr_impl and introduces a distributed_slice in expr_core to collect them at link time. It makes it easier for developers to add new UDF languages. It also moves a lot dependencies out of expr_core, which can reduce build time and improve DX.

This PR makes each UDF a feature and they are not enabled by default to avoid building a lot of dependencies. For those who need to develop UDF, please enable them from ./risedev configure.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others generally LGTM. This also makes expr_udf cleaner, and maybe makes it easier to add new UDF implementations. Cool!

src/tests/simulation/Cargo.toml Show resolved Hide resolved
src/frontend/src/handler/create_function.rs Outdated Show resolved Hide resolved
ci/scripts/release.sh Outdated Show resolved Hide resolved
Comment on lines +11 to +19
default = ["rw-static-link"]
rw-static-link = ["workspace-config/rw-static-link"]
rw-dynamic-link = ["workspace-config/rw-dynamic-link"]
embedded-deno-udf = ["risingwave_expr/embedded-deno-udf"]
embedded-python-udf = ["risingwave_expr/embedded-python-udf"]
default = ["rw-static-link"]
all-udf = ["external-udf", "wasm-udf", "js-udf", "deno-udf", "python-udf"]
external-udf = ["risingwave_expr_impl/external-udf"]
wasm-udf = ["risingwave_expr_impl/wasm-udf"]
js-udf = ["risingwave_expr_impl/js-udf"]
deno-udf = ["risingwave_expr_impl/deno-udf"]
python-udf = ["risingwave_expr_impl/python-udf"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBO I don't very like feature flags, but acceptable since there are already existing ones. 1 flag is as bad as N flags...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My consideration is that each UDF backend brings many dependencies, and most of our developers don't touch UDF at all. So it would be better to skip them in local development. The same thing could be applied to connectors. For example, I almost never use any connector in my development. Those dependencies could be skipped as well to save my disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think it's time to split connector implementations off and design a clear interface for it, as there are now community developers willing to develop new connectors. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think it's time to split connector implementations off

BTW, #12981 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I noticed this PR. It's a little pity it was not merged, but I think it's in the right direction.

src/expr/core/src/expr/expr_udf.rs Outdated Show resolved Hide resolved
src/expr/core/src/sig/mod.rs Outdated Show resolved Hide resolved
#[derive(Debug, Clone)]
struct MetricsVec {
/// Number of successful UDF calls.
success_count: IntCounterVec,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use LabelGuardedIntCounterVec, etc. #14838

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do it in the following PR.

src/expr/core/src/sig/mod.rs Outdated Show resolved Hide resolved
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@wangrunji0408 wangrunji0408 requested a review from xxchan May 20, 2024 04:24
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest looks amazing 🤩

@@ -71,8 +71,8 @@ if [ "${ARCH}" == "aarch64" ]; then
# see https://github.com/tikv/jemallocator/blob/802969384ae0c581255f3375ee2ba774c8d2a754/jemalloc-sys/build.rs#L218
export JEMALLOC_SYS_WITH_LG_PAGE=16
fi
cargo build -p risingwave_cmd_all --features "rw-static-link" --profile release
cargo build -p risingwave_cmd --bin risectl --features "rw-static-link" --profile release
cargo build -p risingwave_cmd_all --features "rw-static-link" --features all-udf --profile release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Does it mean that we used to release versions without support for embedded
UDF?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering whether we should make it a default feature but always use --no-default-features for RiseDev builds, so that we can avoid such mistakes in the future...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, the released version did not contain python UDF and deno UDF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But making these features as default will cause them to be built by rust-analyzer...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering whether we should make it a default feature but always use --no-default-features for RiseDev builds, so that we can avoid such mistakes in the future...

I don't think it's worth it:

  • When release failed, we can just fix it. And it's ok if the script is dirty. We just need to get it done right once.
  • But default features is to make developers happy, which can be hard to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But making these features as default will cause them to be built by rust-analyzer...

Can also suggest developers to set rust-analyzer.cargo.noDefaultFeatures in the editor. 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • But default features is to make developers happy, which can be hard to fix.

The UDF module is touched by few people, so it makes sense to require manual enabling on the features.

However, imagine we're going to add conditional compiles for the connector module, which is actively developed and maintained by a considerable amount of developers. We still need the effort to inform them how to enable these features, especially the IDE part, which is not better than the approach above.

The point is the most of developers do not pay much attention to the compile time. If something comes naturally, it's good. If it requires extra effort, they might give up. Thus in my opinion, it's fine to treat it as a hidden treasure needed to be discovered by some manual efforts. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is the most of developers do not pay much attention to the compile time. If something comes naturally, it's good. If it requires extra effort, they might give up.

I 100% agree. However, my opinion from this fact is that it's better to offer "free" fast compile time for developers. If they are not good at it, we'd better make the default faster. Saving every one's time does good to the whole world!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to enable these features, especially the IDE part

I'm thinking whether it's possible to offer IDE experience without config for feature flags. e.g., cfg_or_panic(madsim).

One idea is that: we put each connector impl in separated crate, which is an optional dependency. If so, maybe the IDE can work without config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go to #16841

src/expr/core/src/sig/udf.rs Outdated Show resolved Hide resolved
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@wangrunji0408 wangrunji0408 added this pull request to the merge queue May 20, 2024
Copy link
Collaborator

@TennyZhuang TennyZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Merged via the queue into main with commit 8b66c91 May 20, 2024
27 of 28 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/dyn-udf-runtime branch May 20, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants