-
Notifications
You must be signed in to change notification settings - Fork 357
Refactor Bazel version retrieval & downloads #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@fweikert Can you check what's happening with the test failure? Also, how do you think we extend to LastGreenRepo to something that supports downloading Bazel binary built at any commit? For example, to allow setting USE_BAZEL_VERSION to a specific commit? This will greatly help us to reproduce failures and bisect Bazel changes. |
Great idea! I'll add support for arbitrary commits in a future PR. Re test: Arg :( I see what is happening here: The test creates a local bazelbuild-releases.json file to avoid sending requests to GitHub. However, the new version no longer uses the "try GitHub, then fall back to GCS" strategy, but calls GCS first. IIRC we wanted to change Bazelisk to use GCS anyway since we won't get throttled, but that means the test will send actual HTTP requests (if we're fine with that the fix is simple: just set USE_BAZEL_VERSION in the failing test case). |
------- Summary ------- This change introduces dedicated interfaces to handle how Bazel versions are retrieved and downloaded. ------- Details ------- The new `core` package introduces interfaces that represent four kinds of repositories that store Bazel versions: `ReleaseRepo`, `CandidateRepo`, `ForkRepo` and `LastGreenRepo`. Implementations of these interfaces must provide a list of available Bazel versions, as well as a function to download a desired version. The old main function was moved into `RunBazelisk()`, which expects implementations of these interfaces. As a result, the caller of this function can control which repositories are used. The previously existing repositories, GitHub and GCS, are still supported since they implement the new interfaces. GitHub is used for forks, whereas GCS is used for everything else. Moreover, the existing `BAZELISK_BASE_URL` environment variable is still supported, and overrides the repository interfaces (if set). Finally, this change moves some utility functions into new, dedicated packages such as `httputil` (for retrieving files via HTTP), `platforms` (for determining platform specific information) and `versions` (for sorting version numbers). ---------- Motivation ---------- The idea is to turn Bazelisk into a library that supports arbitrary repositories. For example, companies might want to ship their own version of Bazelisk that fetches all Bazel binaries from their internal servers. ---------- Next steps ---------- The next PR will move `RunBazelisk()` into the `core` package so that it can be used as a pure library. I didn’t want to move it in this PR to get better code diffs. ------------------ Functional changes ------------------ Even though this is a refactoring, a functional change has been made: The previous version used GitHub as a fallback when GCS was unavailable. This is no longer the case. As a result, the Python version now deviates from the Go version since the latter uses GCS, whereas the first uses GitHub. This forced me to rewrite the shell test, too, since the Go tests must be less strict wrt expected versions: Since the Go version uses GCS the fake json file written in the test is only used by the Python version. As a result, the shell test is effectively an integration test for the Go version and requires network access. Moreover, we can no longer check for specific Bazel version in the Go test since "latest" will point at different releases over time. We should split the test into proper Go and Python versions.
Seriously, how can grep support *, but not + by default?
Fixed the test, PTAL. |
The first version of the PR determined which repo should be used (e.g. release vs candidate repo) in two places: Once in bazelisk.go when the version had to be resolved, and a second time in the core module when a binary had to be downloaded. This has now changed: core.Repositories got the new ResolveVersion() function that resolves a version label and also returns the appropriate download function. As a result, bazelisk.go is no longer in the business of caring about versions and can simply call the download function (unless BAZELISK_BASE_URL is set, of course). Positive side-effects: - All version parsing / analysis was moved into the versions module. - A lot of code from bazelisk.go could be removed. - LastGreenRepo was renamed to CommitRepo and now supports commit hashes, too.
I refactored the refactoring, and also updated the first comment. |
TODO(fweikert): add documentation EDIT: Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM!
Thanks for adding the support for downloading Bazel at specific commit.
Summary
This change introduces dedicated interfaces to handle how Bazel versions are retrieved and downloaded.
Details
The new
core
package introduces interfaces that represent four kinds of repositories that store Bazel versions:ReleaseRepo
,CandidateRepo
,ForkRepo
andCommitRepo
. Implementations of these interfaces must provide a list of available Bazel versions, as well as a function to download a desired version (CommitRepo
is the exception since it must only return two commit hashes, last_green and last_downstream_green).Implementations of these interfaces must be passed to the new
core.Repositories
struct, which uses them to resolve given Bazel versions and to return an appropriate download function. As a result, the download logic in bazelisk.go was reduced substantially.The old main function was moved into
RunBazelisk()
, which expects implementations of these interfaces. As a result, the caller of this function can control which repositories are used.The previously existing repositories, GitHub and GCS, are still supported since they implement the new interfaces. GitHub is used for forks, whereas GCS is used for everything else.
Moreover, the existing
BAZELISK_BASE_URL
environment variable is still supported, and overrides the repository interfaces (if set).Finally, this change moves some utility functions into new, dedicated packages such as
httputil
(for retrieving files via HTTP),platforms
(for determining platform specific information) andversions
(for parsing and sorting version values).Motivation
The idea is to turn Bazelisk into a library that supports arbitrary repositories. For example, companies might want to ship their own version of Bazelisk that fetches all Bazel binaries from their internal servers.
Next steps
The next PR will move
RunBazelisk()
into thecore
package so that it can be used as a pure library. I didn’t want to move it in this PR to get better code diffs.Moreover, we should probably split the shell test (see below).
Functional changes
Even though this is a refactoring, some functional changes have been made:
As a result, the Python version now deviates from the Go version since the latter uses GCS, whereas the first uses GitHub.
This forced me to rewrite the shell test, too, since the Go tests must be less strict wrt expected versions:
Since the Go version uses GCS the fake json file written in the test is only used by the Python version.
As a result, the shell test is effectively an integration test for the Go version and requires network access. Moreover, we can no longer check for specific Bazel version in the Go test since "latest" will point at different releases over time.
We should split the test into proper Go and Python versions.