Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine abstraction - rewrite Writers class to use ResultProcessorBuilder provided by platform #9853

Conversation

pihme
Copy link
Contributor

@pihme pihme commented Jul 21, 2022

Description

This is the last PR in the "shape legacy code into new interfaces" series.

Previously, the new interfaces ProcessingResultBuilder and ProcessingResult were introduced. This is how the engine is supposed to return the processing result to the stream processor. Also previously, these classes were passed between stream processor and engine on interface level, and the division of labor between stream processor and engine was established.

What was not achieved yet, is that the engine uses these new classes. Instead, the engine used existing legacy code to circumvent the new interfaces.

This PR transforms the engine to use the new classes provided by the stream processor:

  • Central point of transformation is the Writers class which at the outset depended on stream writers, state writers, response writers and so on
  • In the end, the Writers class only depends on ProcessingResultBuilder which will be swapped out for each record processed, or each error to be handled
  • To make this transition possible we a) need to refine the Typed(Response/Rejection/Command)Writer classes and remove methods like reset, flush which will no longer be offered by the new interfaces and b) need to implement the refined interfaces with an implementation that forwards calls to ProcessingResultBuilder

Review hints

  • @npepinpe 95% of changes is in the engine, @remcowesterhoud will look into those
  • These changes are done under "better done than perfect" regime. Some naming might be not exactly right. Some clean up might be done after the fact. The focus is to establish much of the engine abstraction concept, so that both teams afterwards can work without interfering with each other.

Related issues

related to #9725

Definition of Done

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Please refer to our review guidelines.

pihme added 13 commits July 21, 2022 11:27
This commit is a bit larger than I would like. It could be split up into a small commit and a large commit,
so I just left it as one large commit.

It started because I wanted to implement a TypedCommandWriter that uses ProcessingResultBuilder as backend
instead of accessing the writers directly.

Turns out, TypedCommandWriter contains reset() and flush() methods which will no longer be allowed.
(There will be a reset() method for the full result, but not for individual parts of the result).

So I created a smaller interface. RestrictedTypedCommandWriter which does not contain these two methods.
(will be renamed in next commit)

Then I went through the current uses of TypedCommandWriter and replaced all with RestrictedTypedCommandWriter.

In one place this was not possible: DeploymentDistributionBehavior, which schedules a background task that
uses reset and flush.

For this use case, a new class LegacyTask was introduced. We need to migrate all background tasks anyway.
The idea of LegacyTask is to be a bag to capture all unwanted dependencies.

Then there was a need to schedule such a legacy task and provide the dependencies when the task is run.

The effect of both of these are:
- Only the background task has a dependency to a writer with flush and reset; and this task is fully in control by the platform
- All other places in the engine only write, but don't use reset or flush
…ResultBuilder

Refactor classes to reuse similar code
Make sure that engine uses only the limited (no reset, flush) interface.

Change logic in ProcessingStateMachine regarding how the writers are reset.
…tBuilder

Also add a TODO to BufferedProcessingResultBuilder because there is certainly more work to do.
…ngResultBuilder

This also required extending the interface. The philosophy here, is that in the result builder interface you have a
low level of abstraction (e.g. all fields you can set).

In the Typed...Writer interfaces you have a higher level of abstraction and more comfort.
So TypedResponseWriter has methods like writeRejectionOnCommand(...), writeEventOnCommand(...), but these are all mapped onto the generic withResponse(...) method in ProcessingResultBuilder
For some unknown reason we need to set the hasResponse flag back to true.
This should not be necessary, but it is. Setting it to true by default eliminates
~168 failed tests. Since the direct access shall be replaced with a buffered one
I think there is no point in investigating this further.

Finally, the error induced in StreamProcessorHealthTest must be moved to
a different method to reflect the way the code flow works now.
@pihme pihme requested a review from npepinpe July 21, 2022 10:18
deploymentDistributionRecord.setPartition(partitionId);
commandWriter.reset();
commandWriter.appendFollowUpCommand(
key, DeploymentDistributionIntent.COMPLETE, deploymentDistributionRecord);

final long pos = commandWriter.flush();
if (pos < 0) {
scheduleService.runDelayed(Duration.ofMillis(100), this);
schedulingService.runDelayed(Duration.ofMillis(100), this);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [ProcessingScheduleService.runDelayed](1) should be avoided because it has been deprecated.
@github-actions
Copy link
Contributor

github-actions bot commented Jul 21, 2022

Unit Test Results

   793 files  +    1     793 suites  +1   1h 47m 50s ⏱️ + 3m 42s
6 321 tests +120  6 312 ✔️ +120  9 💤 ±0  0 ±0 
6 490 runs  +120  6 481 ✔️ +120  9 💤 ±0  0 ±0 

Results for commit 5d26a75. ± Comparison against base commit 358dc73.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@remcowesterhoud remcowesterhoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the small commits, that made it a lot better to review this 🙇

I'll be honest I didn't understand everything that's been changed, probably because I'm not that deep into the abstraction topic. I have added some questions and some minor suggestions.

Overall I didn't see any strange stuff and I like the look of the abstraction so far, so great job 🚀

Copy link
Member

@npepinpe npepinpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 It's great to see everything coming together/into shape!

There was only one issue, which is very minor, about a potentially unused field. But nothing that requires a second review I think.

  • 💭 As I wrote, not sure about the whole mutex and if it does anything useful, but happy to see how it helps us in practice.
  • 💭 It's a bit out of scope, but now that we have an expandable buffer in the result builder, could a large synchronous multi-instance collection cause us to run out of memory by writing tons of event? I guess it's not really all that different from before as we were anyway staging them in memory, right? So just for my peace of mind, the situation is more or less the same as before, correct?


private final ErrorRecord errorRecord = new ErrorRecord();

private final ProcessingResultBuilderMutex resultBuilderMutex =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ What is the goal of the mutex? To prevent callers from caching and using a result builder, or to ensure that callers always get the latest request-scoped builder? If it's the later, then all good. If it's the former, I'm not sure it's the best mechanism, as it doesn't really prevent any caching from happening, and I'm not sure what would happen if a processor were to cache it. I don't have much better ideas to be honest, but it feels like unnecessary complexity as it's not really preventing anything. I guess what's missing for me is that we aren't preventing the builder from being used out of scope, and I'm not sure it'll be obvious when we use it out of scope, or what the behavior will be (retry endlessly? blacklist?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex is mainly to ensure that callers get the latest request-scoped builder. You are right that it doesn't prevent the result from being cached.

If you want to make sure the builder is not used out of scope, the platform could pass a proxy to the engine and invalidate the proxy after it goes out of scope. Then, even if the engine were to cache the result, you could detect it.

@pihme
Copy link
Contributor Author

pihme commented Jul 22, 2022

  • It's a bit out of scope, but now that we have an expandable buffer in the result builder, could a large synchronous multi-instance collection cause us to run out of memory by writing tons of event? I guess it's not really all that different from before as we were anyway staging them in memory, right? So just for my peace of mind, the situation is more or less the same as before, correct?

The expandable buffer is not in use yet. We are still using the direct access. So it is exactly as before

@pihme
Copy link
Contributor Author

pihme commented Jul 22, 2022

bors merge

@zeebe-bors-camunda
Copy link
Contributor

Build succeeded:

@zeebe-bors-camunda zeebe-bors-camunda bot merged commit 6efb06b into main Jul 22, 2022
@zeebe-bors-camunda zeebe-bors-camunda bot deleted the 9725-shape-legacy-code-into-new-interfaces-good-bits-part4 branch July 22, 2022 09:18
zeebe-bors-camunda bot added a commit that referenced this pull request Jul 22, 2022
9852: Move stream processor classes into the stream processor package r=pihme a=pihme

## Description

- Move classes that are only used by stream processor into the corresponding package
- There were no other changes apart from moving the classes

## Related issues

<!-- Which issues are closed by this PR or are related -->

related to #9725 

**Merge only after** #9853  (this PR can easily be regenerated; the other one cannot)



Co-authored-by: pihme <pihme@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants