Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross builds during the release process should ideally run in parallel #1646

Closed
justaugustus opened this issue Oct 16, 2020 · 13 comments
Closed
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@justaugustus
Copy link
Member

What would you like to be added:

Cross builds during the release process should ideally run in parallel (instead of serially).

Why is this needed:

Paste from @puerco's comment in the following thread: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1602728827335400?thread_ts=1602693775.210300&cid=CJH2GBF7Y

puerco  23 hours ago
Stephen, out of curiosity, why are the cross builds not done in parallel?

justaugustus:no_entry_sign:  23 hours ago
It's a memory limitation.
GCB machine types max out at 32GB (I think) and make release "requires" (I think) at least 40GB to run in parallel.
(There's a memory check in the libraries that enforces this)
If you want to file an issue and play around with the values, I'm not opposed to it, but consider it lower priority compared to the other things you're working on.

puerco  23 hours ago
it just seems to me that we could launch several cloud build steps at the same time

justaugustus:no_entry_sign:  23 hours ago
The make release part is a k/k thing

puerco  23 hours ago
oh gotcha.

justaugustus:no_entry_sign:  23 hours ago
And anago is built to be reentrant/somewhat idempotent. Starting to parallelize its steps breaks that "guarantee".
If you want, file an issue for that too and we can potentially look at it, but AFTER we remove anago.

puerco  23 hours ago
yes I was thinking on that for when we finally replace it

cc: @kubernetes/release-engineering
/priority backlog

@justaugustus justaugustus added kind/feature Categorizes issue or PR as related to a new feature. sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Oct 16, 2020
@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Oct 16, 2020
@tpepper
Copy link
Member

tpepper commented Oct 16, 2020

My two cents:

I’m pretttttty sure down there inside things we can break the functional build piece safely out toward running some parallel artifact builders. From the core’s perspective it should just be asking for artifacts. How those are achieved can be abstracted. The fastest “build” would be one that checks and discovers all artifacts are already built correctly and returns, otherwise building just the incremental artifacts that need built and once treating them as distinct things, they can each have an independent make called on them, even in parallel. Getting there will be a fair chunk of refactoring, but would be hugely valuable for:

  • performance
  • human toil versus the latencies of build steps (ie: the sitting around waiting, getting distracted, discovering breakage, reacting)
  • correctness: do we really check today that the built artifacts have some basic level of goodness? If they’re split out we’d definitely need to….don’t want to discover late (as happens occasionally, ie: user reported) that some subset of artifacts is missing versus the automation having a list of things we expect to build, iterating the list to trigger build & check for each, and awaiting all completed successfully or errored.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 16, 2020 via email

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2021
@hasheddan
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2021
@puerco
Copy link
Member

puerco commented Jan 14, 2021

Related work: #1795

@puerco
Copy link
Member

puerco commented Mar 4, 2021

Now that #1795 has merged, I wonder how much more time we would gain if we split the build into multiple GCB jobs.

It is a somewhat large task to write code to split the build, collect and verify the artifacts. Would we see a considerable improvement from splitting the build like ben suggests vs the parallel enabled by Sascha? If it represents just a marginal gain, I'd say we mark this done by #1795

@spiffxp
Copy link
Member

spiffxp commented Mar 5, 2021

Unless I'm misunderstanding the scope of this, I would be concerned that sharding builds may make it more likely we see an incomplete set of artifacts if not all jobs complete, ref kubernetes/test-infra#18808

@LappleApple
Copy link

Triaged April 21, 2021: @puerco doesn't have time in the short-term to do anything here. After the formalisation of supported platforms work concludes would be a good time to revisit this. cc @hasheddan

@xmudrii
Copy link
Member

xmudrii commented Apr 21, 2021

We had a discussion about this on Slack: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1619032489004200

We already run builds in parallel since #1795 and turns out that this speeds up the stage step by ~10 minutes. Right now, this works only for 1.21 because kubernetes/kubernetes#96882 was not cherry-picked to other release branches, but we will take care of that as well.

We might revisit this in the future, but for now, this seems to be a good enough improvement.

@xmudrii
Copy link
Member

xmudrii commented Apr 22, 2021

Right now, this works only for 1.21 because kubernetes/kubernetes#96882 was not cherry-picked to other release branches, but we will take care of that as well.

Following up our discussion, I've created the following PRs to cherry-pick this change to other supported release branches:

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 21, 2021

/remove-lifecycle stale
/close
I think my concerns about kubernetes/test-infra#18808 still stand, but we can deal with them over there.

Based on the comment from @xmudrii above I think this has been resolved. Please /reopen if I'm incorrect in this assumption.

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/remove-lifecycle stale
/close
I think my concerns about kubernetes/test-infra#18808 still stand, but we can deal with them over there.

Based on the comment from @xmudrii above I think this has been resolved. Please /reopen if I'm incorrect in this assumption.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

10 participants