Skip to content

Refactor Bazel version retrieval & downloads #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 16, 2020

Conversation

fweikert
Copy link
Member

@fweikert fweikert commented Aug 17, 2020

Summary

This change introduces dedicated interfaces to handle how Bazel versions are retrieved and downloaded.

Details

The new core package introduces interfaces that represent four kinds of repositories that store Bazel versions: ReleaseRepo, CandidateRepo, ForkRepo and CommitRepo. Implementations of these interfaces must provide a list of available Bazel versions, as well as a function to download a desired version (CommitRepo is the exception since it must only return two commit hashes, last_green and last_downstream_green).

Implementations of these interfaces must be passed to the new core.Repositories struct, which uses them to resolve given Bazel versions and to return an appropriate download function. As a result, the download logic in bazelisk.go was reduced substantially.

The old main function was moved into RunBazelisk(), which expects implementations of these interfaces. As a result, the caller of this function can control which repositories are used.

The previously existing repositories, GitHub and GCS, are still supported since they implement the new interfaces. GitHub is used for forks, whereas GCS is used for everything else.

Moreover, the existing BAZELISK_BASE_URL environment variable is still supported, and overrides the repository interfaces (if set).

Finally, this change moves some utility functions into new, dedicated packages such as httputil (for retrieving files via HTTP), platforms (for determining platform specific information) and versions (for parsing and sorting version values).

Motivation

The idea is to turn Bazelisk into a library that supports arbitrary repositories. For example, companies might want to ship their own version of Bazelisk that fetches all Bazel binaries from their internal servers.

Next steps

The next PR will move RunBazelisk() into the core package so that it can be used as a pure library. I didn’t want to move it in this PR to get better code diffs.
Moreover, we should probably split the shell test (see below).

Functional changes

Even though this is a refactoring, some functional changes have been made:

  1. The previous version used GitHub as a fallback when GCS was unavailable. This is no longer the case.
  2. Forks no longer support last_rc, last_green and last_downstream_green. Forks historically used GitHub, which never contained these binaries anyway. The new behavior actually matches the documentation.
  3. Commit hashes are now valid Bazel version values, which means Bazelisk will attempt to download a Bazel binary built at the given commit.

As a result, the Python version now deviates from the Go version since the latter uses GCS, whereas the first uses GitHub.

This forced me to rewrite the shell test, too, since the Go tests must be less strict wrt expected versions:

Since the Go version uses GCS the fake json file written in the test is only used by the Python version.
As a result, the shell test is effectively an integration test for the Go version and requires network access. Moreover, we can no longer check for specific Bazel version in the Go test since "latest" will point at different releases over time.

We should split the test into proper Go and Python versions.

@meteorcloudy
Copy link
Member

@fweikert
Nice, this PR looks good to me with my limited Go experience! But you probably want Philipp to also take a look ;)

Can you check what's happening with the test failure?

Also, how do you think we extend to LastGreenRepo to something that supports downloading Bazel binary built at any commit? For example, to allow setting USE_BAZEL_VERSION to a specific commit? This will greatly help us to reproduce failures and bisect Bazel changes.

@fweikert
Copy link
Member Author

fweikert commented Aug 18, 2020

Great idea! I'll add support for arbitrary commits in a future PR.

Re test: Arg :( I see what is happening here: The test creates a local bazelbuild-releases.json file to avoid sending requests to GitHub. However, the new version no longer uses the "try GitHub, then fall back to GCS" strategy, but calls GCS first.

IIRC we wanted to change Bazelisk to use GCS anyway since we won't get throttled, but that means the test will send actual HTTP requests (if we're fine with that the fix is simple: just set USE_BAZEL_VERSION in the failing test case).

-------
Summary
-------

This change introduces dedicated interfaces to handle how Bazel versions are retrieved and downloaded.

-------
Details
-------

The new `core` package introduces interfaces that represent four kinds of repositories that store Bazel versions: `ReleaseRepo`, `CandidateRepo`, `ForkRepo` and `LastGreenRepo`. Implementations of these interfaces must provide a list of available Bazel versions, as well as a function to download a desired version.

The old main function was moved into `RunBazelisk()`, which expects implementations of these interfaces. As a result, the caller of this function can control which repositories are used.

The previously existing repositories, GitHub and GCS, are still supported since they implement the new interfaces. GitHub is used for forks, whereas GCS is used for everything else.

Moreover, the existing `BAZELISK_BASE_URL` environment variable is still supported, and overrides the repository interfaces (if set).

Finally, this change moves some utility functions into new, dedicated packages such as `httputil` (for retrieving files via HTTP), `platforms` (for determining platform specific information) and `versions` (for sorting version numbers).

----------
Motivation
----------

The idea is to turn Bazelisk into a library that supports arbitrary repositories. For example, companies might want to ship their own version of Bazelisk that fetches all Bazel binaries from their internal servers.

----------
Next steps
----------

The next PR will move `RunBazelisk()` into the `core` package so that it can be used as a pure library. I didn’t want to move it in this PR to get better code diffs.

------------------
Functional changes
------------------

Even though this is a refactoring, a functional change has been made: The previous version used GitHub as a fallback when GCS was unavailable. This is no longer the case.

As a result, the Python version now deviates from the Go version since the latter uses GCS, whereas the first uses GitHub.

This forced me to rewrite the shell test, too, since the Go tests must be less strict wrt expected versions:

Since the Go version uses GCS the fake json file written in the test is only used by the Python version.
As a result, the shell test is effectively an integration test for the Go version and requires network access. Moreover, we can no longer check for specific Bazel version in the Go test since "latest" will point at different releases over time.

We should split the test into proper Go and Python versions.
Seriously, how can grep support *, but not + by default?
@fweikert
Copy link
Member Author

Fixed the test, PTAL.

The first version of the PR determined which repo should be used (e.g. release vs candidate repo) in two places:
Once in bazelisk.go when the version had to be resolved, and a second time in the core module when a binary had to be downloaded.

This has now changed: core.Repositories got the new ResolveVersion() function that resolves a version label and also returns the appropriate download function.
As a result, bazelisk.go is no longer in the business of caring about versions and can simply call the download function (unless BAZELISK_BASE_URL is set, of course).

Positive side-effects:
- All version parsing / analysis was moved into the versions module.
- A lot of code from bazelisk.go could be removed.
- LastGreenRepo was renamed to CommitRepo and now supports commit hashes, too.
@fweikert
Copy link
Member Author

fweikert commented Sep 2, 2020

I refactored the refactoring, and also updated the first comment.

@fweikert
Copy link
Member Author

fweikert commented Sep 2, 2020

TODO(fweikert): add documentation

EDIT: Done!

Copy link
Member

@meteorcloudy meteorcloudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM!
Thanks for adding the support for downloading Bazel at specific commit.

@fweikert fweikert merged commit 827bcfe into bazelbuild:master Sep 16, 2020
@fweikert fweikert deleted the repo3 branch September 16, 2020 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants