Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Elasticsearch client to 8.0.0 #1350

Closed
danielmitterdorfer opened this issue Oct 1, 2021 · 7 comments · Fixed by #1669
Closed

Upgrade Elasticsearch client to 8.0.0 #1350

danielmitterdorfer opened this issue Oct 1, 2021 · 7 comments · Fixed by #1669
Assignees
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Milestone

Comments

@danielmitterdorfer
Copy link
Member

Note: This is not actionable at the moment but acts as a reference of things to consider when upgrading the Elasticsearch client to 8.0.0.

Looking at the Elasticsearch client roadmap a few items stand out but specifically elastic/elasticsearch-py#1680 will be tricky: The client will not allow to pass a body but rather requires to specify the body in a more structured manner. Initially using body will only be deprecated but we need to think about alternatives.

@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo blocked This item cannot be finished because of a dependency :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Oct 1, 2021
@danielmitterdorfer danielmitterdorfer added this to the Backlog milestone Oct 1, 2021
@pquentin
Copy link
Member

pquentin commented Oct 4, 2021

I think it's fine to continue using body as it will be supported in 8.0, and possibly later. If it gets removed at some point, we can always start using **body for the cases where we pass a body around before submitting to the client.

@sethmlarson
Copy link
Contributor

The first pre-release of 8.0.0 is now available on PyPI: https://github.com/elastic/elasticsearch-py/releases/tag/v8.0.0a1

@michaelbaamonde michaelbaamonde self-assigned this May 24, 2022
@michaelbaamonde michaelbaamonde removed the blocked This item cannot be finished because of a dependency label May 24, 2022
@michaelbaamonde
Copy link
Contributor

michaelbaamonde commented Jun 8, 2022

@pquentin

As discussed, I've closed the linked PR due to size and am instead going to break it into a sequence of smaller PRs, while trying to limit the potential for breakage as much as possible. This is a little tricky. Here's my thinking:

First, we organize the client-related code to make it more readable and separate the different concerns at play. I've issued a PR that does this by creating a new esrally.client package. This should help keep things clean as we make the bigger changes that we know we need.

Assuming that looks good and we merge it, we could do a few small PRs that add in utility code that we'll eventually need (e.g. the _ProductChecker class, utility functions (example), etc.) These should be transparent and have no effect on runtime behavior.

Now, we'd be at the point where things are more challenging, as we'd need to introduce a dependency on the new client version and cut over to using it. I see two ways that we could do this cutover gradually.

Option 1: Install two client versions side-by-side temporarily

We could phase most of the changes required for the upgrade by installing v8.2.0 (elasticsearch8[async]==8.2.0) as well as elastic-transport==8.1.2 in addition to 7.14.0 (via elasticsearch[async]==7.14.0.) We would import 7.14.0 as elasticsearch and use it at runtime as we do today, but have elasticsearch8 available for import, as well.

This would allow us to merge the core additions we need for the upgrade (e.g. our own subclasses of elasticsearch8.Elasticsearch, elastic_transport.Transport, etc.) to master in a piecemeal fashion. There would temporarily be separate code paths for the 7.14.0 client and the 8.2.0 client, meaning duplication and effectively unused code on master, but we would remove the 7.14.0-related code with the final PR that finishes the cutover to the new client version.

  • In this scenario, all commits are purely additive except for the final one that removes the 7.14.0 client dependency and related code.

  • The big downside is that master would temporarily contain code that's not actually used at runtime.

Note: We could go as far as to do this behind a feature flag. Could be overkill, but could also make testing easier.

Option 2: Feature branch

Another option could be to have a feature branch in Rally for the client upgrade and issue PRs against that branch gradually before doing one large merge to master once we're happy with the state of the feature branch.

  • In this scenario, we'd have more of a mix of green and red diffs during review and commits would probably have to be larger, but master would remain clean until the final merge.

  • The big downside is the inevitability of merge conflicts with master.

In either case, we'd have more atomic changes that should be easier to review and present less risk overall. What do you think?

@pquentin
Copy link
Member

pquentin commented Jun 9, 2022

Hmm, those options are interesting, thanks. My preference here is to merge code to master that gets used immediately, like the rally and rally-tracks PRs we merged yesterday, they were validated in our nightly benchmarks.

And yes, this means at some point we'll have a bigger PR that makes the switch, but that's OK. (And I'm now more familiar with the problem so reviewing will be easier.)

So basically I would suggest the following PRs:

@pquentin
Copy link
Member

By the way thinking about this more I don't think we should have the 7.17 step, let's not increase the scope too much. Regarding the version check, since we already disable it in the async client that is used for the actual benchmarking measures, I think we should just check once when Rally starts and then disable it everywhere.

@pquentin
Copy link
Member

pquentin commented Jan 27, 2023

As mentioned above, since #1510 was quite big, we issued a few pull requests to reduce the scope:

That said, #1510 was itself quite reasonable, so I merged master into it after the above work, fixing many conflicts. The result is in https://github.com/elastic/rally/tree/es-client-upgrade, see https://github.com/elastic/rally/compare/es-client-upgrade?expand=1 for the diff.

If we want to upgrade to 7.17 first (which is the officially recommended path), I backported a fix in the client (elastic/elasticsearch-py#2102) that would allow us to revert #1580, further simplifying the actual upgrade to 8.x.

@b-deam b-deam assigned b-deam and unassigned michaelbaamonde Feb 2, 2023
@b-deam
Copy link
Member

b-deam commented Feb 2, 2023

I've picked this up. For now, my high level plan is to continue on working on thees-client-upgrade branch and:

  • Be able to esrally race [...] against:
    • 8.6.x
    • 7.17.x
    • 6.8.x
  • Get all unit tests passing
  • Ensure support for all existing --client-options
  • Refactor exception handling based on changes in 8.x
  • Address the various TODOs in the es-client-upgrade branch
  • Upgrade the elasticsearch[async] and elastic-transport libs to 8.6.1 and 8.4.0, respectively
  • Address any deprecation warnings
  • Execute various benchmarks to establish whether we're introducing any client side bottlenecks with the upgrade
    • elastic/logs is a given, because of its pervasive nature in a lot of our work
    • Open to suggestions on other specific/targeted benchmarks, otherwise I'll fallback on the default track selection (geonames, nyc_taxis etc.)

Once I'm comfortable there's no obvious performance regressions, I'll open a PR and we can work through the feedback then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants