Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pause exporting endpoint in gateway #9630

Closed
Tracked by #9606
deepthidevaki opened this issue Jun 28, 2022 · 17 comments · Fixed by #10410
Closed
Tracked by #9606

Add Pause exporting endpoint in gateway #9630

deepthidevaki opened this issue Jun 28, 2022 · 17 comments · Fixed by #10410
Assignees
Labels
version:8.1.0-alpha4 version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@deepthidevaki
Copy link
Contributor

deepthidevaki commented Jun 28, 2022

We must pause exporting while taking backup. See doc .

Currently pause exporting is exposed in broker . But we would like to expose this in the gateway so that there is a single point of entry for all cluster management operations. See RebalancingService as reference.

Goal:

  • An endpoint to pause exporting on all partitions. User sends one request to this endpoint to pause exporting on all partitions. Gateway must distributed request to all brokers and coordinate them.
  • If the operation succeeds, then it must be guaranteed that no records are exported until resume exporting is invoked. This must be guaranteed even after a restart or a change of leader.
  • If the operation fails, then one or more partitions may not have paused the exporting. In this case, the user must retry the operation.
  • If the operation fails, a few partitions may have paused exporting. To clean it up, users should call resume exporting explicitly.

Decide the request and response format before working on this issue.

@lenaschoenburg
Copy link
Member

@deepthidevaki One concern I have with this is that there are edge cases where new brokers join the cluster or brokers lose data & restart. In those cases the pause might have succeeded but a new broker will not know about this and continue exporting. I'm wondering if we need a new way to persist the pause state that survives data loss.

@deepthidevaki
Copy link
Contributor Author

Those are valid points. But let's keep it simple now. New brokers joining is currently not a supported feature. We could also treat data loss as a special case and ignore for it now. Users can always add additional steps to cancel backup process while any of the above cases occurs.

Also I'm hoping that we can get rid of the need to pause eventually and be able to take backups without pausing. So it may not be worth the effort to make it perfect. We could always revisit the approach if it is absolutely required.

@lenaschoenburg
Copy link
Member

There are two cases that are problematic:

  1. When the pause is issued, one or more brokers are unavailable. The gateway would need to recognize this and fail the pause. This is a bit unfortunate but doable, I think.
  2. When the pause is issued, all brokers are available and actually pause exporting. The gateway considers the pause as successful. Shortly after, one broker dies, loses data and is brought up again. It'll start without the exporterPaused flag and might start exporting. This is unlikely because it'll probably not immediately become leader after a data loss but it can happen.

I currently see no way to prevent the second case.

@deepthidevaki
Copy link
Contributor Author

Case 1 must be handled. We can decide what is the semantics of failure. The simplest would be to leave the cluster in a partial state - some brokers have paused the exporting, while others not and let users deal with it. If we want to make it "atomic", it would require more work or a different approach.

I would argue that we can ignore case 2 (data loss) now.

@lenaschoenburg
Copy link
Member

I would argue that we can ignore case 2 (data loss) now.

I sort of disagree here. For the user (either human or some backup service) it's probably difficult to tell whether data loss has happened.

Although I suppose we could query the exported position after pause and before resuming. If the positions have changed, the backup can be considered invalid.

@deepthidevaki
Copy link
Contributor Author

For the user (either human or some backup service) it's probably difficult to tell whether data loss has happened.

The data loss shouldn't be treated like other failures like node-restart or network partition. We can guarantee recovery only in some cases. For example:- any number of nodes can be restarted and the cluster will recover without user intervention. The worse thing that can happen is system is not available for a short time. But, we can guarantee the cluster can recover from data loss only if data loss happens in < quorum replicas. So far this must be ensure by the user and will be following a special process (eg;- as part of maintenance replacing hardware). In that sense the "user" must be aware of when a data loss happens. Even if this process is automated, it wouldn't be difficult to add necessary action to abort/retry ongoing backups.

@deepthidevaki
Copy link
Contributor Author

That said, if you have some idea how we can handle data loss scenario, we could discuss it. But as I said, I'm not sure if it is worth the effort.

Also I'm hoping that we can get rid of the need to pause eventually and be able to take backups without pausing. So it may not be worth the effort to make it perfect. We could always revisit the approach if it is absolutely required.

@lenaschoenburg
Copy link
Member

That said, if you have some idea how we can handle data loss scenario, we could discuss it.

First thing that came to mind was to move the pause flag to ExportersState.
But I understand your point that we don't consider data loss like any other failure and that it is a valid choice to leave it to the user to handle this.

@deepthidevaki
Copy link
Contributor Author

First thing that came to mind was to move the pause flag to ExportersState.

This would be lost after the data loss as well, no?

@lenaschoenburg
Copy link
Member

Yes, although eventually it'd become part of the snapshot & thus be available again after replication. But since it's not part of the log, it's not enough 😞. We had this idea floating around where exporters are also RecordProcessors, that'd make it easy to read/write pause and resume commands to the log stream, I suppose.

For now I'll follow your opinion and simply leave everything as is - data loss must be detected & handled by the user.

@lenaschoenburg
Copy link
Member

I've found one more issue that doesn't involve data loss: Not all brokers are members for all partitions. After a restart, the set of partitions that a broker is a member of can change.

That means that we need to pause processing "globally", on a per broker basis and can't re-use the existing pause flag which can only be set for partitions that the broker is a member of.

@deepthidevaki
Copy link
Contributor Author

I've found one more issue that doesn't involve data loss: Not all brokers are members for all partitions. After a restart, the set of partitions that a broker is a member of can change.

When would this happen? Isn't the configuration static?

@lenaschoenburg
Copy link
Member

lenaschoenburg commented Jul 27, 2022

When would this happen? Isn't the configuration static?

Oh, you are right! I thought it might change dynamically but RoundRobinPartitionDistributor seems to get a static list of members and partitions so the distribution should always be the same. Same for FixedPartitionDistributor of course, the only other PartitionDistributor. My bad!

@lenaschoenburg
Copy link
Member

lenaschoenburg commented Jul 27, 2022

With that sorted, I think I have a plan how to implement this:

  • Gateway needs a way to send command to a (broker, partition) pair. So far it can only send to leaders of a partition or to arbitrary members of a partition, see BrokerClient.
  • Gateway needs to know about all brokers and partitions, even unavailable ones. That information seems to be not yet available to the gateway, it usually uses the BrokerTopologyManager which only reports the current topology. Gateway knows about all brokers and partitions via BrokerTopologyManager.
  • Brokers receives the AdminRequest and pauses exporting for that partition. If pausing fails, ErrorResponseWriter is used to respond with an error, otherwise an empty AdminResponse is sent.
  • Gateway actuator
    • Add a new actuator at the gateway, let's say POST /exporting/pause.
    • Gateway sends a AdminRequest with a new type PAUSE_EXPORTING for each (broker, partition) pair.
    • If all requests finish successfully, gateway responds with a success status code, on failure or time out a failure status code is returned.

We could then extend the actuator to make retrying more efficient:

  1. Respond with a list of (broker, partition) pairs where pausing has failed or timed out.
  2. Support pausing for a given list of (broker, partition) pairs.

@deepthidevaki
Copy link
Contributor Author

Sounds good 👍

3. Add a new actuator at the gateway, let's say POST /exporting/pause.

What would be in the response to the user?

@lenaschoenburg
Copy link
Member

What would be in the response to the user?

I thought we could start with just success/failure via the HTTP status code, let's say 200 OK for success and 503 or 504 for failure. As an extension:

We could then extend the actuator to make retrying more efficient:

  1. Respond with a list of (broker, partition) pairs where pausing has failed or timed out.
  2. Support pausing for a given list of (broker, partition) pairs.

I haven't thought about a json schema for those yet.

zeebe-bors-camunda bot added a commit that referenced this issue Jul 28, 2022
9910: feat: broker requests can address specific brokers r=oleschoenburg a=oleschoenburg

## Description

Broker requests can optionally provide a broker id to indicate that this request should be sent to a specific broker, regardless of whether this broker is leader for the partition or not.

## Related issues

<!-- Which issues are closed by this PR or are related -->

relates to #9630 



9911: Await until no more marked-for-deletion segments are present r=npepinpe a=npepinpe

## Description

This PR allows for a slight delay for segment deletion to occur. The test checks that we close all readers by waiting for segments to be truly deleted; if a reader is left opened, then a "*-deleted" file will be present.

Since deletion is asynchronous, we should be a bit lenient and wait for a bit of time. I also took the opportunity to simplify a bit the verification of the test, improving the error message on failure I think.

## Related issues

closes #9685 



9913: deps(.github): bump slackapi/slack-github-action from 1.19.0 to 1.21.0 r=oleschoenburg a=dependabot[bot]

Bumps [slackapi/slack-github-action](https://github.com/slackapi/slack-github-action) from 1.19.0 to 1.21.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/slackapi/slack-github-action/releases">slackapi/slack-github-action's releases</a>.</em></p>
<blockquote>
<h2>Slack Send V1.21.0</h2>
<h2>What's Changed</h2>
<ul>
<li>updated to 1.21.0, fixed update-ts by <a href="https://github.com/stevengill"><code>`@​stevengill</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/110">slackapi/slack-github-action#110</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/slackapi/slack-github-action/compare/v1.20.0...v1.21.0">https://github.com/slackapi/slack-github-action/compare/v1.20.0...v1.21.0</a></p>
<h2>Slack Send v1.20.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Bump <code>`@​actions/github</code>` from 5.0.1 to 5.0.3 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/94">slackapi/slack-github-action#94</a></li>
<li>Bump <code>`@​vercel/ncc</code>` from 0.33.4 to 0.34.0 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/98">slackapi/slack-github-action#98</a></li>
<li>Bump eslint-plugin-jsdoc from 39.2.9 to 39.3.2 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/96">slackapi/slack-github-action#96</a></li>
<li>Bump eslint from 8.14.0 to 8.16.0 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/99">slackapi/slack-github-action#99</a></li>
<li>Bump <code>`@​actions/core</code>` from 1.7.0 to 1.8.2 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/100">slackapi/slack-github-action#100</a></li>
<li>Bump mocha from 9.2.2 to 10.0.0 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/97">slackapi/slack-github-action#97</a></li>
<li>Bump sinon from 13.0.2 to 14.0.0 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/95">slackapi/slack-github-action#95</a></li>
<li>README.md: Fix typo by <a href="https://github.com/cclauss"><code>`@​cclauss</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/102">slackapi/slack-github-action#102</a></li>
<li>Bump <code>`@​actions/core</code>` from 1.8.2 to 1.9.0 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/105">slackapi/slack-github-action#105</a></li>
<li>Bump eslint from 8.16.0 to 8.18.0 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/106">slackapi/slack-github-action#106</a></li>
<li>Bump eslint-plugin-jsdoc from 39.3.2 to 39.3.3 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/107">slackapi/slack-github-action#107</a></li>
<li>Bump <code>`@​slack/web-api</code>` from 6.7.1 to 6.7.2 by <a href="https://github.com/dependabot"><code>`@​dependabot</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/108">slackapi/slack-github-action#108</a></li>
<li>Add support to update messages <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/75">#75</a> by <a href="https://github.com/kuboon"><code>`@​kuboon</code></a>` in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/109">slackapi/slack-github-action#109</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/cclauss"><code>`@​cclauss</code></a>` made their first contribution in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/102">slackapi/slack-github-action#102</a></li>
<li><a href="https://github.com/kuboon"><code>`@​kuboon</code></a>` made their first contribution in <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/pull/109">slackapi/slack-github-action#109</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/slackapi/slack-github-action/compare/v1.19.0...v1.20.0">https://github.com/slackapi/slack-github-action/compare/v1.19.0...v1.20.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/slackapi/slack-github-action/commit/936158bbe252e9a6062e793ea4609642c966e302"><code>936158b</code></a> Automatic compilation</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/c1472d49d7e64ec13e48b5c157571fff3f06998e"><code>c1472d4</code></a> updated to 1.21.0, fixed update-ts (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/110">#110</a>)</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/3a354251e498618617a633aa3f9ab290ec9af847"><code>3a35425</code></a> updated to version 1.20.0</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/18a00ad4b3af3ebddb1643a887416468c908eb42"><code>18a00ad</code></a> Add support to update messages <a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/75">#75</a> (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/109">#109</a>)</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/05390e63d05f0c6bf1df4e61025724c46e785a66"><code>05390e6</code></a> Bump <code>`@​slack/web-api</code>` from 6.7.1 to 6.7.2 (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/108">#108</a>)</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/e3dd3aedd3fd10ac3702a4a049787022b9d38d05"><code>e3dd3ae</code></a> Bump eslint-plugin-jsdoc from 39.3.2 to 39.3.3 (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/107">#107</a>)</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/8ff81266566f71b66f982dedefc9a295601e94e6"><code>8ff8126</code></a> Bump eslint from 8.16.0 to 8.18.0 (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/106">#106</a>)</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/5253b6aa4bc935036a31c7efcd03656a4f4eb731"><code>5253b6a</code></a> Bump <code>`@​actions/core</code>` from 1.8.2 to 1.9.0 (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/105">#105</a>)</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/54ab0a40d2dce881a507b3044df99291e75ff013"><code>54ab0a4</code></a> README.md: Fix typo</li>
<li><a href="https://github.com/slackapi/slack-github-action/commit/0f34a3c3a952c279307c91ae7c9c3e181a1eb3a1"><code>0f34a3c</code></a> Bump sinon from 13.0.2 to 14.0.0 (<a href="https://github-redirect.dependabot.com/slackapi/slack-github-action/issues/95">#95</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/slackapi/slack-github-action/compare/v1.19.0...v1.21.0">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=slackapi/slack-github-action&package-manager=github_actions&previous-version=1.19.0&new-version=1.21.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

9914: deps(go): bump google.golang.org/protobuf from 1.28.0 to 1.28.1 in /clients/go r=oleschoenburg a=dependabot[bot]

Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.28.0 to 1.28.1.
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/6875c3d7242d1a3db910ce8a504f124cb840c23a"><code>6875c3d</code></a> all: release v1.28.1</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/881da6ece0696714305aee39afcac17b6abe8e00"><code>881da6e</code></a> all: Add prebuild binaries for arm64</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/2a74a0e82391d1fc344067bc6bfc4647cd8ae7d9"><code>2a74a0e</code></a> A+C: delete AUTHORS and CONTRIBUTORS</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/de9682ad1656cd1ee6808865f7a42f5a5d79eb73"><code>de9682a</code></a> internal/impl: improve MessageInfo.New performance</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/b0a944684d78a6b7c1f2bb6864fa910385d5b97a"><code>b0a9446</code></a> all: reformat with go1.19 gofmt</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/c1bbc5d45b2b8a056496c3db183b24e7f9a22ff1"><code>c1bbc5d</code></a> all: make integration test work on darwin/arm64</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/5f429f7042ff6743d1cd695281802b8c9b8d9395"><code>5f429f7</code></a> proto: fix compilation failure in tests</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/fc44d00d5a99973a6deb0863d6aaa7446c016066"><code>fc44d00</code></a> proto: use reflect.Ptr for backward compatibility</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/380c339ec12310c04cd1c399669ab0bce57b6cd6"><code>380c339</code></a> proto: short-circuit Equal when inputs are identical</li>
<li><a href="https://github.com/protocolbuffers/protobuf-go/commit/784c4825545540dc41a1dc85715d3251903bc8ce"><code>784c482</code></a> all: remove shorthand import aliases</li>
<li>Additional commits viewable in <a href="https://github.com/protocolbuffers/protobuf-go/compare/v1.28.0...v1.28.1">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=google.golang.org/protobuf&package-manager=go_modules&previous-version=1.28.0&new-version=1.28.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 1, 2022
9926: New `AdminRequest` type to pause exporting r=oleschoenburg a=oleschoenburg

## Description

Adds a new request type, `PAUSE_EXPORTING` and adds an optional broker id for all admin requests.

## Related issues

<!-- Which issues are closed by this PR or are related -->

relates to #9630 



Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 4, 2022
9945: Broker can handle admin request to pause exporting r=oleschoenburg a=oleschoenburg

## Description
This started as a small PR to just add support for `PAUSE_EXPORTING` in `AdminApiRequestHandler`.
However, there were a couple of changes necessary to implement this properly:
* Single partition admin access. `AdminAccess`'s scope can now be narrowed to a specific partition.
* `AdminApiService` is now started after the partition manager has started. This allows for the api service to immediately have access to `AdminAccess`.
* Api request handlers can be asynchronous.

I've also included a somewhat large refactoring of the existing admin api tests.

## Related issues

<!-- Which issues are closed by this PR or are related -->

relates to #9630



Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
zeebe-bors-camunda bot added a commit that referenced this issue Aug 4, 2022
9945: Broker can handle admin request to pause exporting r=oleschoenburg a=oleschoenburg

## Description
This started as a small PR to just add support for `PAUSE_EXPORTING` in `AdminApiRequestHandler`.
However, there were a couple of changes necessary to implement this properly:
* Single partition admin access. `AdminAccess`'s scope can now be narrowed to a specific partition.
* `AdminApiService` is now started after the partition manager has started. This allows for the api service to immediately have access to `AdminAccess`.
* Api request handlers can be asynchronous.

I've also included a somewhat large refactoring of the existing admin api tests.

## Related issues

<!-- Which issues are closed by this PR or are related -->

relates to #9630



Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
@lenaschoenburg
Copy link
Member

I'm shelving this until we triaged #9996, maybe writing the actual actuator would be easier that way 🙏

@Zelldon Zelldon mentioned this issue Aug 5, 2022
6 tasks
@deepthidevaki deepthidevaki added the version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 label Sep 6, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Sep 20, 2022
10409: feat(backup): add metrics for checkpoints r=deepthidevaki a=deepthidevaki

## Description

Added following metrics:
- number of checkpoint records processed per partition. Can be filtered by `result = created` or `result = ignored`.
- last checkpoint id and last checkpoint position per partition. It is updated in both leader and followers.

PS:- This PR does not add panels to grafana dashboard. 

## Related issues

closes #10385 



10410: feat: gateway endpoint to pause exporting r=oleschoenburg a=oleschoenburg

Adds an actuator `/exporting` that supports a `pause` operation. On `POST /actuator/exporting/pause`, the current topology is checked for completeness (all brokers available, all partitions have the expected number of members). If this fails, exporting is not paused and the request returns status code 500. If the topology is complete, the actuator sends pause requests to all `(broker, partition)` pairs. Only if all requests succeed we return status code 204, otherwise 500.

This also fixes a bug where pausing exporting would fail if no exporters are configured.

closes #9630


10415: fix(ci): use bash shell for maven run  r=megglos a=megglos

## Description

Ensures a fail fast behavior as this includes [github setting](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#exit-codes-and-error-action-preference) `set -eo pipefail` which makes sure failures in a pipe chain are not lost and any maven test failure causes the build to fail.

## Related issues

closes #10260



Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
Co-authored-by: Meggle (Sebastian Bathke) <sebastian.bathke@camunda.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:8.1.0-alpha4 version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants