Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for dataflow framework #3187

Closed
15 of 19 tasks
discord9 opened this issue Jan 18, 2024 · 4 comments
Closed
15 of 19 tasks

Tracking issue for dataflow framework #3187

discord9 opened this issue Jan 18, 2024 · 4 comments
Assignees
Labels
tracking-issue A tracking issue for a feature.
Milestone

Comments

@discord9
Copy link
Contributor

discord9 commented Jan 18, 2024

What problem does the new feature solve?

Being able to do simple continuous aggregation.

What does the feature do?

  • not a complete streaming-processing system. Only a must subset functionalities are provided.
  • can handle most aggregate operators within one table(i.e. Sum, avg, min, max and comparison operators). But others (join, trigger, txn etc.) are not the target feature.
    Framework

Implementation challenges

  • Persisent intermediate state for operator
  • Write more operator suitable for stream computation
  • Limit the scope of system boundary to simple aggregation.

Implementation Progress

@discord9 discord9 added the tracking-issue A tracking issue for a feature. label Feb 26, 2024
@fengjiachun fengjiachun added this to the v0.8 milestone Feb 28, 2024
@killme2008 killme2008 pinned this issue Mar 5, 2024
@tisonkun
Copy link
Contributor

tisonkun commented Mar 6, 2024

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

@discord9
Copy link
Contributor Author

discord9 commented Apr 8, 2024

@discord9 I read the RFC now and wonder what's a completed sample for this feature.

I can see how to create a task (continuous query/materialize view):

CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m)

Then we can use avg_over_5m as a normal table reference in query?

Yes, sorry for the late reply, github's layout for issues is really terrible, this task also create a result table avg_over_5m and write to it with negligible delay, so naturally one can use avg_over_5m in normal query

@killme2008
Copy link
Contributor

@discord9 I think we can close this issue right now. The next iteration could start with a new issue. What do you think?

@discord9
Copy link
Contributor Author

discord9 commented Jun 3, 2024

Close this issue as now have a basic dataflow framework, and can start a new issue to track it's next iteration

@discord9 discord9 closed this as completed Jun 3, 2024
@discord9 discord9 unpinned this issue Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracking-issue A tracking issue for a feature.
Projects
None yet
Development

No branches or pull requests

4 participants