Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep the AWS/Azure/GCP SDK version up to date #39492

Open
1 of 3 tasks
zmoog opened this issue May 9, 2024 · 7 comments
Open
1 of 3 tasks

Keep the AWS/Azure/GCP SDK version up to date #39492

zmoog opened this issue May 9, 2024 · 7 comments
Assignees
Labels
enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Comments

@zmoog
Copy link
Contributor

zmoog commented May 9, 2024

Situation

All major CSPs provide SDKs for accessing their API. Beats use the cloud provider API to collect logs, metrics, and metadata.

Over time, CSPs release new versions of SDK. Most of the time, new versions are minor or patch releases. Major releases with breaking changes usually happen every few years.

Usually, we upgrade the cloud provider SDK in two circumstances:

  • we need a new feature
  • we need a bugfix

That's only available in a new SDK version.

Our attitude is primarily reactive.

Problem statement

The current reactive posture has a few downsides:

  • On average, our SDK modules are outdated to various degrees (missing fixes)
  • Upgrades happen not so often, sometimes taking big jumps in versions (increasing risks)
  • On average, if we need a bugfix in one of our dependencies, we can add it now, but the subsequent stack releases may be weeks away; we only backport to the previous release.

Solutions

Manage AWS/Azure/GCP SDK version incrementally using Dependabot.

Pros:

  • It is more manageable to upgrade 1-2 dependencies at a time instead of doing a big batch occasionally.
  • SDKs are up to date with fixes and improvements
  • We integrate updates in the next release to improve the change or avoid bug reports and support requests.

Cons:

  • We are making more changes; we need to mitigate this risk by improving our test suite (we'll address this in a dedicated issue).

Tasks

  1. Team:obs-ds-hosted-services
    zmoog

Related

Here are a few related issues and PRs:

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 9, 2024
@zmoog zmoog added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label May 9, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 9, 2024
@zmoog
Copy link
Contributor Author

zmoog commented May 9, 2024

Here is how APM Server manages OTel dependencies:
https://github.com/elastic/apm-server/blob/main/.github/dependabot.yml

@zmoog
Copy link
Contributor Author

zmoog commented May 9, 2024

I'm creating the first PR to manage the Azure SDK to get feedback from the team responsible for Dependabot.

@zmoog zmoog self-assigned this May 9, 2024
@zmoog zmoog changed the title Keep AWS/Azure/GCP SDK up to date Keep AWS/Azure/GCP SDK version up to date May 9, 2024
@zmoog zmoog changed the title Keep AWS/Azure/GCP SDK version up to date Keep the AWS/Azure/GCP SDK version up to date May 9, 2024
@agithomas
Copy link
Contributor

agithomas commented May 10, 2024

If we notice the issue mentioned here, a sequential upgrade from working -> broken SDK (Nov '23 -> Feb'24) , it would have taken the application into a less desirable state.

So, the scope may be extended or changed to qualifying the SDK version. Addressing dependencies, Testing & verification, addressing gaps in testing (if any) may be part of the qualification process. Once qualified, how soon should we consume the upgrade?

@zmoog
Copy link
Contributor Author

zmoog commented May 14, 2024

If we notice the issue mentioned here, a sequential upgrade from working -> broken SDK (Nov '23 -> Feb'24) , it would have taken the application into a less desirable state.

So, the scope may be extended or changed to qualifying the SDK version. Addressing dependencies, Testing & verification, addressing gaps in testing (if any) may be part of the qualification process. Once qualified, how soon should we consume the upgrade?

@agithomas, what do you mean by "qualifying the SDK version"? Can you elaborate a little on this?

@zmoog
Copy link
Contributor Author

zmoog commented May 14, 2024

To be clear, the goal of PRs like #39495 is not to avoid problems like "not found, ResolveEndpointV2" but to ensure Beats remains up to date with new features, possibly avoid support requests, or at least ship the fixes sooner.

I agree that to avoid problems like "not found, ResolveEndpointV2" from happening we need a holistic strategy that includes:

  • improved dependency management
  • improved testing

@agithomas
Copy link
Contributor

@agithomas, what do you mean by "qualifying the SDK version"? Can you elaborate a little on this?

My comment was more towards the holistic strategy.

Thanks for adding the scope here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

No branches or pull requests

3 participants