Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove LogStream writers from Engine #9724

Closed
20 tasks done
Zelldon opened this issue Jul 7, 2022 · 2 comments · Fixed by #10203
Closed
20 tasks done

Remove LogStream writers from Engine #9724

Zelldon opened this issue Jul 7, 2022 · 2 comments · Fixed by #10203
Assignees
Labels
area/maintainability Marks an issue as improving the maintainability of the project kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@Zelldon
Copy link
Member

Zelldon commented Jul 7, 2022

Description

Part of #9600

In the current state the TypedProcessors (which might become the Engine later) and other entities, like JobTimeoutTrigger, DeploymentDistributor etc. have knowledge about how to write Records to the LogStream abstraction. Ideally, they shouldn't care about that detail. In the end, it would be great if the Engine can get something in and produce something out. For that we have to do some pre-work like removing the actual LogStreamBatchWriters usage and reduce related interfaces.

In the POC #9602 we split up the implementation of the TypedStreamWriterImpl so that it just writes into a wrapped buffer. This allowed to pre-claim that buffer in the start, and initialize the Engine with that Writer. This means we can reduce the dependency to the LogStream (helps when we split the Engine and the StreamPlatform). Later we renamed the Writers to something like Builder, but this is discussable.

See Writers in #9602 (comment)

Todo:

  • Implement TypedStreamWriter that writes to a ByteBuffer #9780
    • Copy the content from TypedStreamWriterImpl to a new class
    • Remove the LogStreamBatch usage from that class, write directly into a pre-claimed buffer
  • Add a new method to the LogStreamBatch
  • Write tests for LogStreamBatch
  • made the canWriteEvent method public on the LogStreamBatchWriter in order to use it in the RecordBatch
  • Iterate over BufferedProcessingResultBuilder #10001
    • Replace with RecordBatch
    • Use RecordBatch everywhere
  • Write tests for `BufferedStreamWriter #9838
  • Decouple ProcessingResult from Log and Command writer #9998
  • Find a good new name for the Writers Interface, including Command-, Rejection-, StateWriter. In the POC [POC]: Engine Abstraction #9602, we called it Builders, because wanted in the end build a list of Records. But maybe we can also use a different name here.
  • Create a new Result Interface / Class which can be returned by the Engine
    • The Result can for simplicity return a BufferWriter like we did in the POC [POC]: Engine Abstraction #9602 (we can improve that later), but we can also just return a list of Records if you find a good way for it
    • Result should be used by Processing and ScheduledTasks to return Results
  • Engine should return the result - ⚠️ here we need the Stream Processing changes first Refactor StreamProcessor / Engine #9725
  • Use result in platform, write the records to the LogStreamBatchWriter
  • Bonus: The result contains only the list of the records (instead of the general Buffer or BufferWriter)
  • Bonus2 The serialization is done in the StreamPlatform on writing to the LogStreamBatchWriter
  • We might need to adjust some tests
@Zelldon Zelldon added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. team/distributed area/maintainability Marks an issue as improving the maintainability of the project labels Jul 7, 2022
@Zelldon
Copy link
Member Author

Zelldon commented Jul 8, 2022

There was a comment regarding the naming by @saig0 maybe we consider this here #9602 (comment)

@Zelldon
Copy link
Member Author

Zelldon commented Jul 8, 2022

There was a comment by @npepinpe regarding to the serialization and the Result class:

#9602 (comment)
Should the engine implementation actually serialize the outputs?
Pros: frictionless, no need to interact with the distributed, easier to extend later if we add different processors.
Cons: more complicated for testing, embedding, possibly need to implement own integrity checks, schema evolution, not reusing the expertise of the distributed team regarding persistence.
I personally lean with Peter on having "record in -> record(s) out" instead of dealing with (de)serialization.

@pihme pihme self-assigned this Jul 11, 2022
@Zelldon Zelldon self-assigned this Aug 4, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Aug 15, 2022
10057: refactor(engine): merge two legacy interfaces together r=pihme a=pihme

## Description

* `LegacyTypedStreamWriter` already extended `LegacyTypedCommandWriter`
* This PR replaces all references to `LegacyTypedCommandWriter` with `LegacyTypedStreamWriter` 
* It then merges the two interfaces and deleted `LegacyTypedCommandWriter`
* The net effect of this, is that we only need to replace one interface going forward

## Related issues

relates to #9724



Co-authored-by: pihme <pihme@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 16, 2022
10076: Engine abstraction remove legacy writer r=pihme a=pihme

## Description

- Introduces a new task that can be scheduled by the engine and that can return records to be written
- Replaces all references to `LegacyTypedStreamWriter`outside of stream processor with scheduling of the new tasks
- The overall result is that the engine no longer depends on any writer that writes directly to the stream. Stream processor is now in full control of when and how records shall be written (Heureka!)

## Related issues

related to #9724, #9730, #9725



Co-authored-by: pihme <pihme@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 16, 2022
10076: Engine abstraction remove legacy writer r=pihme a=pihme

## Description

- Introduces a new task that can be scheduled by the engine and that can return records to be written
- Replaces all references to `LegacyTypedStreamWriter`outside of stream processor with scheduling of the new tasks
- The overall result is that the engine no longer depends on any writer that writes directly to the stream. Stream processor is now in full control of when and how records shall be written (Heureka!)

## Related issues

related to #9724, #9730, #9725



Co-authored-by: pihme <pihme@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 17, 2022
10076: Engine abstraction remove legacy writer r=Zelldon a=pihme

## Description

- Introduces a new task that can be scheduled by the engine and that can return records to be written
- Replaces all references to `LegacyTypedStreamWriter`outside of stream processor with scheduling of the new tasks
- The overall result is that the engine no longer depends on any writer that writes directly to the stream. Stream processor is now in full control of when and how records shall be written (Heureka!)

## Related issues

related to #9724, #9730, #9725



Co-authored-by: pihme <pihme@users.noreply.github.com>
Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 23, 2022
10118: Add new RecordBatch classes and interfaces r=Zelldon a=Zelldon

## Description

With the #9600 and especially with #9724 we wanted to create a way to return results (records) on processing and on scheduled Tasks, without the need to use the LogStreamWriters. This should allow us to reduce the dependency on the log stream and in general, make the test setup more easier. 

We did the first attempt with the [BufferedProcessingResultBuilder](https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/streamprocessor/BufferedProcessingResultBuilder.java), which only contains a buffer to write into the needed record details. This was also described as a solution in the issue #9724. 

During the discussion of the proposal, there was already the idea raised that it would be nicer to have a List of records returned by the Processing- and TaskResult. This was added as a Bonus in #9724.

Today I thought longer about the `BufferedProcessingResultBuilder` and how we could test it and in general make it nicer to return the processing result. Checking the code of the result builder and [LogStreamBatchWriterImpl.java](https://github.com/camunda/zeebe/blob/main/logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java), I realized that we do not really need to copy the record details into a big buffer (as we do it [here](https://github.com/camunda/zeebe/blob/main/logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java#L163-L195)). We could create a "struct" (some would call it a java record but to avoid confusion I call it struct now) of the record details which we want to write. Most parts of the struct are primitives (like enums intent, valueType etc.) and the RecordValue would be only thing we would need to copy into a buffer. This would allow better debuggability (e.g. we can see during processing what is already part of the result) and no need to mess around with offsets in the implementation. We can keep references to these "structs" in a list and iterate over it in the StreamProcessor to write the record details, similar to what we [do in the LogStreamBatchWriter here](https://github.com/camunda/zeebe/blob/main/logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java#L240). 

One interesting part after writing the classes and tests, I also realized that the batch entry interface could return the `UnifiedRecordValue` in order to not have to mess again with buffers (BufferWriters or BufferReaders) and to make the engine tests easier to implement. For example, if we would like to split the modules we could write tests without the stream processor and take the records from the processing result and give it directly again to the engine 🤯

So this PR adds new interfaces and first implementation for so called RecordBatch and RecordBatchEntry, which should be later used in the Processing- and TaskResult. Since the CommandWriter needs similar information as the LogStreamWriter, we can reuse the RecordBatch for both the records to write and for the response, which is part of the Processing Result. No other code was touched/modified.

Right now the interfaces are kept to a minimum and documentation as well. I guess we might need to iterate over them later again, but I think it is a good step forward. 

Feel free to propose another names for the classes and interfaces, nothing is set into stone. #namingishard

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

related to #10001



Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 23, 2022
10118: Add new RecordBatch classes and interfaces r=Zelldon a=Zelldon

## Description

With the #9600 and especially with #9724 we wanted to create a way to return results (records) on processing and on scheduled Tasks, without the need to use the LogStreamWriters. This should allow us to reduce the dependency on the log stream and in general, make the test setup more easier. 

We did the first attempt with the [BufferedProcessingResultBuilder](https://github.com/camunda/zeebe/blob/main/engine/src/main/java/io/camunda/zeebe/streamprocessor/BufferedProcessingResultBuilder.java), which only contains a buffer to write into the needed record details. This was also described as a solution in the issue #9724. 

During the discussion of the proposal, there was already the idea raised that it would be nicer to have a List of records returned by the Processing- and TaskResult. This was added as a Bonus in #9724.

Today I thought longer about the `BufferedProcessingResultBuilder` and how we could test it and in general make it nicer to return the processing result. Checking the code of the result builder and [LogStreamBatchWriterImpl.java](https://github.com/camunda/zeebe/blob/main/logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java), I realized that we do not really need to copy the record details into a big buffer (as we do it [here](https://github.com/camunda/zeebe/blob/main/logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java#L163-L195)). We could create a "struct" (some would call it a java record but to avoid confusion I call it struct now) of the record details which we want to write. Most parts of the struct are primitives (like enums intent, valueType etc.) and the RecordValue would be only thing we would need to copy into a buffer. This would allow better debuggability (e.g. we can see during processing what is already part of the result) and no need to mess around with offsets in the implementation. We can keep references to these "structs" in a list and iterate over it in the StreamProcessor to write the record details, similar to what we [do in the LogStreamBatchWriter here](https://github.com/camunda/zeebe/blob/main/logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java#L240). 

One interesting part after writing the classes and tests, I also realized that the batch entry interface could return the `UnifiedRecordValue` in order to not have to mess again with buffers (BufferWriters or BufferReaders) and to make the engine tests easier to implement. For example, if we would like to split the modules we could write tests without the stream processor and take the records from the processing result and give it directly again to the engine 🤯

So this PR adds new interfaces and first implementation for so called RecordBatch and RecordBatchEntry, which should be later used in the Processing- and TaskResult. Since the CommandWriter needs similar information as the LogStreamWriter, we can reuse the RecordBatch for both the records to write and for the response, which is part of the Processing Result. No other code was touched/modified.

Right now the interfaces are kept to a minimum and documentation as well. I guess we might need to iterate over them later again, but I think it is a good step forward. 

Feel free to propose another names for the classes and interfaces, nothing is set into stone. #namingishard

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

related to #10001



Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 25, 2022
10163: Use RecordBatch to write follow up records r=Zelldon a=Zelldon

## Description

In order to determine the maximum record batch size, we use a new method on the batch writer. This method allows us to check whether we would be able to write a specific event count and batch size to the writer. 

During processing a RecordBatch is built up in the ProcessingResultBuilder and given as ImmutableRecordBatch inside the ProcessingResult to the PrcoessingStateMachine. Here we tried to not change the interfaces, which causes us to cast on certain places, this is similar to how it currently is done in the LegacyWriter.

The RecordBatch is consumed by the ProcessingStateMachine in order to write the records. A follow-up PR will clean up other parts, like the unused writer in the ProcessingResult, etc.

Some StreamProcessor tests relied on the writer usage, which has been disabled/removed for now. We will soon rewrite them.


<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

related #9724
related #10001



Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
@Zelldon Zelldon mentioned this issue Aug 26, 2022
10 tasks
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/maintainability Marks an issue as improving the maintainability of the project kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants