Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A huge rejection reason causes an overflow in the record metadata #6442

Open
saig0 opened this issue Feb 25, 2021 · 3 comments
Open

A huge rejection reason causes an overflow in the record metadata #6442

saig0 opened this issue Feb 25, 2021 · 3 comments
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) component/engine component/zeebe Related to the Zeebe component/team kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@saig0
Copy link
Member

saig0 commented Feb 25, 2021

Describe the bug
When I deploy a big workflow that contains many violations (e.g. no job type, invalid expression) then I see the following failure in the log:

io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalArgumentException 
  with message invalid offset: -17568, will retry... 

As a result, the record and no further records are exported. After a restart of the broker, the same failure occurs. The failure is logged until the broker is stopped.

This bug was reported in the forum: https://forum.zeebe.io/t/error-io-zeebe-util-retry-endlessretrystrategy-invalid-offset-17568/2023/

Analysis

On the stream, we see that the deployment is rejected. It has a huge rejection reason that lists all the violations.

The underlying problem is that the rejection reason is too long. The rejection reason is part of the RecordMetada. The length of the metadata is decoded in the SBE record as a short. If the length of the metadata is greater than 32767 bytes then the deserialization of the record fails.

To Reproduce

  1. Deploy a big workflow that contains many violations (for example: solaris_firmware_update.bpmn.txt)

Or run the following test case:

Unit Test

   // given
    final var builder = Bpmn.createExecutableProcess("p").startEvent();
    IntStream.range(0, 400).forEach(i -> builder.serviceTask("task" + i));
    final var workflow = builder.done();

    // when
    final var deployment = engine.deployment().withXmlResource(workflow).expectRejection().deploy();

    // then
    assertThat(deployment.getRecordType()).isEqualTo(RecordType.COMMAND_REJECTION);

Expected behavior
The deployment is rejected and the rejection is exported successfully.

Log/Stacktrace

[Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] ERROR io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalArgumentException with message invalid offset: -17568, will retry...
Full Stacktrace

2021-02-12 11:24:47.521 [Broker-0-Exporter-1] [Broker-0-zb-fs-workers-1] ERROR io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalArgumentException with message invalid offset: -17568, will retry...
java.lang.IllegalArgumentException: invalid offset: -17568
        at org.agrona.concurrent.UnsafeBuffer.boundsCheckWrap(UnsafeBuffer.java:1702) ~[agrona-1.8.0.jar:1.8.0]
        at org.agrona.concurrent.UnsafeBuffer.wrap(UnsafeBuffer.java:256) ~[agrona-1.8.0.jar:1.8.0]
        at io.zeebe.msgpack.spec.MsgPackReader.wrap(MsgPackReader.java:49) ~[zeebe-msgpack-core-0.25.3.jar:0.25.3]
        at io.zeebe.msgpack.UnpackedObject.wrap(UnpackedObject.java:29) ~[zeebe-msgpack-value-0.25.3.jar:0.25.3]
        at io.zeebe.logstreams.impl.log.LoggedEventImpl.readValue(LoggedEventImpl.java:135) ~[zeebe-logstreams-0.25.3.jar:0.25.3]
        at io.zeebe.engine.processing.streamprocessor.RecordValues.readRecordValue(RecordValues.java:35) ~[zeebe-workflow-engine-0.25.3.jar:0.25.3]
        at io.zeebe.broker.exporter.stream.ExporterDirector$RecordExporter.wrap(ExporterDirector.java:328) ~[zeebe-broker-0.25.3.jar:0.25.3]
        at io.zeebe.broker.exporter.stream.ExporterDirector.lambda$exportEvent$6(ExporterDirector.java:253) ~[zeebe-broker-0.25.3.jar:0.25.3]
        at io.zeebe.util.retry.ActorRetryMechanism.run(ActorRetryMechanism.java:36) ~[zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.retry.EndlessRetryStrategy.run(EndlessRetryStrategy.java:50) ~[zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:73) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:94) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:78) [zeebe-util-0.25.3.jar:0.25.3]
        at io.zeebe.util.sched.ActorThread.run(ActorThread.java:191) [zeebe-util-0.25.3.jar:0.25.3]

Environment:

  • OS: Ubuntu
  • Zeebe Version: 0.25.3, 0.26.1
  • Configuration: Hazelcast exporter
@saig0 saig0 added the kind/bug Categorizes an issue or PR as a bug label Feb 25, 2021
@saig0 saig0 added scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround Status: Needs Priority and removed Status: Needs Triage labels Feb 25, 2021
@npepinpe
Copy link
Member

Let's do it before 1.0 as it may require protocol changes which are always harder to keep backwards compatible.

@npepinpe npepinpe added this to Ready in Zeebe Apr 6, 2021
@npepinpe npepinpe moved this from Ready to Planned in Zeebe Apr 6, 2021
@npepinpe
Copy link
Member

Oops, looks like we didn't get there. But I imagine this should be do-able in a backwards compatible way? We should check that 🤔

@npepinpe npepinpe moved this from Planned to Ready in Zeebe Jul 14, 2021
@npepinpe npepinpe moved this from Ready to Planned in Zeebe Jul 22, 2021
@KerstinHebel KerstinHebel removed this from Planned in Zeebe Mar 23, 2022
@npepinpe npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Availability labels Apr 11, 2022
korthout added a commit that referenced this issue May 13, 2022
The metrics exporter shouldn't accept all records, because then the
exporter director unnecessarily reads record values that it this
exporter won't export anyways. In the past, this has led to the exporter
getting stuck when it ran into reading problems like #6442.

The metrics exporter is only interested in 3 events (job, job_batch and
process instance).
korthout added a commit that referenced this issue May 13, 2022
The metrics exporter shouldn't accept all records, because then the
exporter director unnecessarily reads record values that it this
exporter won't export anyways. In the past, this has led to the exporter
getting stuck when it ran into reading problems like #6442.

The metrics exporter is only interested in 3 events (job, job_batch and
process instance).
zeebe-bors-camunda bot added a commit that referenced this issue May 13, 2022
9371: Configure record filter for metrics exporter r=korthout a=korthout

## Description

<!-- Please explain the changes you made here. -->

Configures a record filter for the Metrics exporter.

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #9240 
relates to #6442 



9373: deps(go): bump github.com/docker/docker from 20.10.15+incompatible to 20.10.16+incompatible in /clients/go r=Zelldon a=dependabot[bot]

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 20.10.15+incompatible to 20.10.16+incompatible.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/docker/docker/releases">github.com/docker/docker's releases</a>.</em></p>
<blockquote>
<h2>v20.10.16</h2>
<p>This release of Docker Engine fixes a regression in the Docker CLI builds for
macOS, fixes an issue with <code>docker stats</code> when using containerd 1.5 and up,
and updates the Go runtime to include a fix for <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-29526">CVE-2022-29526</a>.</p>
<h3>Client</h3>
<ul>
<li>Fix a regression in binaries for macOS introduced in <a href="%5B#201015%5D(https://github-redirect.dependabot.com/docker/docker/issues/201015)">20.10.15</a>, which
resulted in a panic <a href="https://github-redirect.dependabot.com/docker/cli/pull/3592">docker/cli#43426</a>.</li>
<li>Update golang.org/x/sys dependency which contains a fix for
<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-29526">CVE-2022-29526</a>.</li>
</ul>
<h3>Daemon</h3>
<ul>
<li>Fix an issue where <code>docker stats</code> was showing empty stats when running with
containerd 1.5.0 or up <a href="https://github-redirect.dependabot.com/moby/moby/pull/43567">moby/moby#43567</a>.</li>
<li>Update the <code>golang.org/x/sys</code> build-time dependency which contains a fix for <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-29526">CVE-2022-29526</a>.</li>
</ul>
<h3>Packaging</h3>
<ul>
<li>Update Go runtime to <a href="https://go.dev/doc/devel/release#go1.17.minor">1.17.10</a>,
which contains a fix for <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-29526">CVE-2022-29526</a>.</li>
<li>Use &quot;weak&quot; dependencies for the <code>docker scan</code> CLI plugin, to prevent a
&quot;conflicting requests&quot; error when users performed an off-line installation from
downloaded RPM packages <a href="https://github-redirect.dependabot.com/docker/docker-ce-packaging/pull/659">docker/docker-ce-packaging#659</a>.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/moby/moby/commit/f756502055d2e36a84f2068e6620bea5ecf09058"><code>f756502</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/docker/issues/43582">#43582</a> from thaJeztah/20.10_bump_golang_1.17.10</li>
<li><a href="https://github.com/moby/moby/commit/a15acb4bd6b9b65f93de5f52de56a29ae4d63a4c"><code>a15acb4</code></a> [20.10] vendor: golang.org/x/sys v0.0.0-20220412211240-33da011f77ad</li>
<li><a href="https://github.com/moby/moby/commit/5f2e0b79adc4d1786043176ee599e73f33430989"><code>5f2e0b7</code></a> [20.10] update golang to 1.17.10</li>
<li><a href="https://github.com/moby/moby/commit/462cd7de50edbba462ca2de82ceb10d26ac1e6bb"><code>462cd7d</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/docker/docker/issues/43567">#43567</a> from 42wim/fixstats</li>
<li><a href="https://github.com/moby/moby/commit/be7855fdbe3136f144feb3b9340e7d21c5c45e1e"><code>be7855f</code></a> vendor: update github.com/containerd/cgroups and github.com/cilium/ebpf</li>
<li>See full diff in <a href="https://github.com/docker/docker/compare/v20.10.15...v20.10.16">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/docker/docker&package-manager=go_modules&previous-version=20.10.15+incompatible&new-version=20.10.16+incompatible)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 13, 2022
9371: Configure record filter for metrics exporter r=korthout a=korthout

## Description

<!-- Please explain the changes you made here. -->

Configures a record filter for the Metrics exporter.

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #9240 
relates to #6442 



Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
github-actions bot pushed a commit that referenced this issue May 13, 2022
The metrics exporter shouldn't accept all records, because then the
exporter director unnecessarily reads record values that it this
exporter won't export anyways. In the past, this has led to the exporter
getting stuck when it ran into reading problems like #6442.

The metrics exporter is only interested in 3 events (job, job_batch and
process instance).

(cherry picked from commit db64a95)
korthout added a commit that referenced this issue May 20, 2022
The metrics exporter shouldn't accept all records, because then the
exporter director unnecessarily reads record values that it this
exporter won't export anyways. In the past, this has led to the exporter
getting stuck when it ran into reading problems like #6442.

The metrics exporter is only interested in 3 events (job, job_batch and
process instance).

(cherry picked from commit db64a95)
zeebe-bors-camunda bot added a commit that referenced this issue May 25, 2022
9425: [Backport stable/1.3] Configure record filter for metrics exporter r=npepinpe a=korthout

# Description
Backport of #9371 to `stable/1.3`.

relates to #9240 #6442

Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 25, 2022
9378: [Backport stable/8.0] Configure record filter for metrics exporter r=npepinpe a=github-actions[bot]

# Description
Backport of #9371 to `stable/8.0`.

relates to #9240 #6442

Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue May 25, 2022
9425: [Backport stable/1.3] Configure record filter for metrics exporter r=npepinpe a=korthout

# Description
Backport of #9371 to `stable/1.3`.

relates to #9240 #6442

Co-authored-by: Nico Korthout <nico.korthout@camunda.com>
Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@saig0 saig0 changed the title java.lang.IllegalArgumentException: invalid offset: -17568 A huge rejection reason causes an overflow in the record metadata Jun 8, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
@Zelldon
Copy link
Member

Zelldon commented Dec 27, 2022

Similar observed symptom we had with the web-modeler #11284 I think both can be solved with the same approach, you can read in the linked issue

@romansmirnov romansmirnov added the component/zeebe Related to the Zeebe component/team label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) component/engine component/zeebe Related to the Zeebe component/team kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

No branches or pull requests

6 participants