Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export all records to ES by default #8338

Closed
saig0 opened this issue Dec 8, 2021 · 4 comments · Fixed by #9474
Closed

Export all records to ES by default #8338

saig0 opened this issue Dec 8, 2021 · 4 comments · Fixed by #9474
Assignees
Labels
area/observability Marks an issue as observability related kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/broker Marks an issue or PR to appear in the broker section of the changelog version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@saig0
Copy link
Member

saig0 commented Dec 8, 2021

Is your feature request related to a problem? Please describe.
Currently, we use the Zeebe Elasticsearch export in Camunda Cloud to export some records to ES that are consumed by Operate, Tasklist, or Optimize. But we don't export all records by default.

In addition to these applications, we could use the records in ES also for debugging behavior in Zeebe. Sometimes the records can't be read from the log stream because the log is compacted already (default: every 5 minutes).

But some of the records are not exported by default. Most relevant of these records may be:

  • message
  • message_subscription
  • process_message_subscription
  • message_start_event_subscription ¹
  • timer ¹
  • process_event ¹

¹ - this record is not available in the exporter yet #8337

Describe the solution you'd like
Export all records to ES by default. Or, at least the most relevant records. For example, we could exclude job_batch records because there is usually a high number of these records but they may be not very valuable for debugging.

Before changing the configuration, we should check the impact on the disk usage and performance. We should contact the Operate/Tasklist/Optimize teams (especially @sdorokhova) about the performance impact for ES and the applications.

Additionally, we should write a short guide on how to read the records from Camunda Cloud (for the Zeebe medic or a Zeebe dev working on an incident/support case). Maybe, we could use a local Kibana. @deepthidevaki may help with this part.

Describe alternatives you've considered
The records are not exported by default. We need to change the configuration manually to export the records. This option has the disadvantage that the behavior you want to debug happened already. If it doesn't happen regularly or is not reproducible then we are not able to export the records.

Additional context
This request was raised as an action item of the following incident: https://docs.google.com/document/d/1xQPZbGUG57VOi-u4MI8CZF_I6FbBh0nc_ukSFiZwsNk

@saig0 saig0 added the kind/feature Categorizes an issue or PR as a feature, i.e. new behavior label Dec 8, 2021
@saig0 saig0 changed the title [WIP] Export all records to ES by default Export all records to ES by default Dec 9, 2021
@saig0 saig0 added Impact: Integration scope/broker Marks an issue or PR to appear in the broker section of the changelog labels Dec 9, 2021
@npepinpe npepinpe added this to Planned in Zeebe Dec 9, 2021
@npepinpe
Copy link
Member

npepinpe commented Dec 9, 2021

It makes sense to me that the default configuration would export everything - i.e. it doesn't know anything about its consumers' needs, instead of having an Operate specific configuration. Operate/Tasklist/etc. would then document the required configuration that they need in their own docs.

This is a breaking change, as updating will modify the previous behavior for users who were running with just defaults, but if we plan to have more consumers I still believe it makes sense. @menski wdyt?

One thing to keep in mind with this issue: remember to double check with the controller/console after this to check if we don't have a custom configuration already that would override defaults, as we would want to also export everything there. Additionally, we will have to benchmark the impact of this change: impact on the Zeebe brokers (as we now export more records), and impact on Elasticsearch resources (CPU, mem, and HDD).

Anyway, prioritized as planned for now as we first should do #8337 before starting this.

@korthout
Copy link
Member

@remcowesterhoud as discussed, this issue only concerns the event records. Commands and command rejections should be investigated separately in #9369

@remcowesterhoud
Copy link
Contributor

Since operate is waiting for this issue I will work on this now, rather than waiting until #8337 is completed. This means the message_start_event_subscription, timer and process_event records will not be exported by default yet. I shall create a follow-up issue for these records.

@remcowesterhoud
Copy link
Contributor

It's easier to wait for #9379 to be merged until we start with this one. So it's going to be on hold just a little longer.

@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Marks an issue as observability related kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/broker Marks an issue or PR to appear in the broker section of the changelog version:8.1.0-alpha2 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants