New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: PaneInfo not populated in Go SDK #31153
Comments
I messed up the bug report (submitted too early) and it got classified as P3 whereas I think it should be P2 |
I've tossed up a draft PR that essentially tries to pipe pane info through all the
|
I think it was actually coming in correctly, but I was misapplying triggering, so things were not happening as expected because I was attempting to use paneinfo before bundles were being committed to the backend. |
First, thank for finding and reporting this! Agreed that p2 is more appropriate for this issue generally, updated labels. But probably not higher than that. In principle, using State and Timers should enable the same semantics as they are lower level primitives. BUT, that won't work very well for executions on Batch Dataflow, since timers behave differently when all data is available a-priory. This would be a blocker for getting triggers working properly on the Go SDK's local runner, Prism, as it's not doing anything with Panes or Triggers at present, though that work is coming up. (see #29650 for the Prism implementation list). And proper Pane propagation would allow for implementing natively in the Go SDK sophisticated Streaming enabled File Sinks, which rely on correct pane information to output and update files written in an unbounded pipeline. The example code is demonstrating that the default pane isn't being set to the NoFiringPane. That is a bug, that is probably broken due to a lack of propagation and should be fixed. What it's not demonstrating is that the pane should be different due to a trigger. IIRC Triggers only resolve at the downstream GBK/Aggregation, so that's when there would be multiple firings, and different Panes. Panes are only updated after a trigger is enacted ("fired") from a runner source, like after a GBK. More precisely, The default "No Firing Pane" is the expected default until a trigger actually resolves. The "No Firing Pane" means the given pane was not due to a trigger firing. So, having the following pipeline should show different firings:
|
What happened?
I'm attempting to use early triggering and
PaneInfo
to limit bundle sizes to avoid running into the dataflow limit of 80MB and have found that PaneInfo does not appear to be populated correctly.Runner: Dataflow
Beam Version: 2.55.1
Here's a test that I believe demonstrates the problem:
The logs are all:
Even if I don't have the indexes correct in the test (the test is failing on the
EqualsList
), I would expect these to be internally consistent. That is, I would expect there to be at least oneIsFirst:true
andIsLast:true
each.Issue Priority
Priority: 2 (default)
Issue Components
The text was updated successfully, but these errors were encountered: