Cross builds during the release process should ideally run in parallel #1646

justaugustus · 2020-10-16T01:18:29Z

What would you like to be added:

Cross builds during the release process should ideally run in parallel (instead of serially).

Why is this needed:

Paste from @puerco's comment in the following thread: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1602728827335400?thread_ts=1602693775.210300&cid=CJH2GBF7Y

puerco  23 hours ago
Stephen, out of curiosity, why are the cross builds not done in parallel?

justaugustus:no_entry_sign:  23 hours ago
It's a memory limitation.
GCB machine types max out at 32GB (I think) and make release "requires" (I think) at least 40GB to run in parallel.
(There's a memory check in the libraries that enforces this)
If you want to file an issue and play around with the values, I'm not opposed to it, but consider it lower priority compared to the other things you're working on.

puerco  23 hours ago
it just seems to me that we could launch several cloud build steps at the same time

justaugustus:no_entry_sign:  23 hours ago
The make release part is a k/k thing

puerco  23 hours ago
oh gotcha.

justaugustus:no_entry_sign:  23 hours ago
And anago is built to be reentrant/somewhat idempotent. Starting to parallelize its steps breaks that "guarantee".
If you want, file an issue for that too and we can potentially look at it, but AFTER we remove anago.

puerco  23 hours ago
yes I was thinking on that for when we finally replace it

cc: @kubernetes/release-engineering
/priority backlog

The text was updated successfully, but these errors were encountered:

tpepper · 2020-10-16T22:28:43Z

My two cents:

I’m pretttttty sure down there inside things we can break the functional build piece safely out toward running some parallel artifact builders. From the core’s perspective it should just be asking for artifacts. How those are achieved can be abstracted. The fastest “build” would be one that checks and discovers all artifacts are already built correctly and returns, otherwise building just the incremental artifacts that need built and once treating them as distinct things, they can each have an independent make called on them, even in parallel. Getting there will be a fair chunk of refactoring, but would be hugely valuable for:

performance
human toil versus the latencies of build steps (ie: the sitting around waiting, getting distracted, discovering breakage, reacting)
correctness: do we really check today that the built artifacts have some basic level of goodness? If they’re split out we’d definitely need to….don’t want to discover late (as happens occasionally, ie: user reported) that some subset of artifacts is missing versus the automation having a list of things we expect to build, iterating the list to trigger build & check for each, and awaiting all completed successfully or errored.

BenTheElder · 2020-10-16T22:32:38Z

+1, I'd probably start by looking at running N GCB jobs with something like make quick-release GOOS= GOARCH= in each versus running make cross. There's some minor skew issues (ensuring the same set of platforms), and coordinating collecting the outputs, but it should be doable, and fast(er).

…

On Fri, Oct 16, 2020 at 3:28 PM Tim Pepper ***@***.***> wrote: My two cents: I’m pretttttty sure down there inside things we can break the functional build piece safely out toward running some parallel artifact builders. From the core’s perspective it should just be asking for artifacts. How those are achieved can be abstracted. The fastest “build” would be one that checks and discovers all artifacts are already built correctly and returns, otherwise building just the incremental artifacts that need built and once treating them as distinct things, they can each have an independent make called on them, even in parallel. Getting there will be a fair chunk of refactoring, but would be hugely valuable for: - performance - human toil versus the latencies of build steps (ie: the sitting around waiting, getting distracted, discovering breakage, reacting) - correctness: do we really check today that the built artifacts have some basic level of goodness? If they’re split out we’d definitely need to….don’t want to discover late (as happens occasionally, ie: user reported) that some subset of artifacts is missing versus the automation having a list of things we expect to build, iterating the list to trigger build & check for each, and awaiting all completed successfully or errored. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#1646 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHADK4IK2BPSC22PHRMYQ3SLDCKTANCNFSM4SSWJ3AQ> .

fejta-bot · 2021-01-14T23:31:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

hasheddan · 2021-01-14T23:32:37Z

/remove-lifecycle stale

puerco · 2021-01-14T23:34:33Z

Related work: #1795

puerco · 2021-03-04T21:08:56Z

Now that #1795 has merged, I wonder how much more time we would gain if we split the build into multiple GCB jobs.

It is a somewhat large task to write code to split the build, collect and verify the artifacts. Would we see a considerable improvement from splitting the build like ben suggests vs the parallel enabled by Sascha? If it represents just a marginal gain, I'd say we mark this done by #1795

spiffxp · 2021-03-05T02:24:39Z

Unless I'm misunderstanding the scope of this, I would be concerned that sharding builds may make it more likely we see an incomplete set of artifacts if not all jobs complete, ref kubernetes/test-infra#18808

LappleApple · 2021-04-21T17:48:10Z

Triaged April 21, 2021: @puerco doesn't have time in the short-term to do anything here. After the formalisation of supported platforms work concludes would be a good time to revisit this. cc @hasheddan

xmudrii · 2021-04-21T19:38:13Z

We had a discussion about this on Slack: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1619032489004200

We already run builds in parallel since #1795 and turns out that this speeds up the stage step by ~10 minutes. Right now, this works only for 1.21 because kubernetes/kubernetes#96882 was not cherry-picked to other release branches, but we will take care of that as well.

We might revisit this in the future, but for now, this seems to be a good enough improvement.

xmudrii · 2021-04-22T10:23:06Z

Right now, this works only for 1.21 because kubernetes/kubernetes#96882 was not cherry-picked to other release branches, but we will take care of that as well.

Following up our discussion, I've created the following PRs to cherry-pick this change to other supported release branches:

Automated cherry pick of #96882: Make parallel build memory threshold configurable kubernetes#101363
Automated cherry pick of #96882: Make parallel build memory threshold configurable kubernetes#101365
Automated cherry pick of #96882: Make parallel build memory threshold configurable kubernetes#101366

fejta-bot · 2021-07-21T10:43:44Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

spiffxp · 2021-07-21T15:47:17Z

/remove-lifecycle stale
/close
I think my concerns about kubernetes/test-infra#18808 still stand, but we can deal with them over there.

Based on the comment from @xmudrii above I think this has been resolved. Please /reopen if I'm incorrect in this assumption.

k8s-ci-robot · 2021-07-21T15:47:23Z

@spiffxp: Closing this issue.

In response to this:

/remove-lifecycle stale
/close
I think my concerns about kubernetes/test-infra#18808 still stand, but we can deal with them over there.

Based on the comment from @xmudrii above I think this has been resolved. Please /reopen if I'm incorrect in this assumption.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

justaugustus added kind/feature Categorizes issue or PR as related to a new feature. sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Oct 16, 2020

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Oct 16, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021

k8s-ci-robot closed this as completed Jul 21, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross builds during the release process should ideally run in parallel #1646

Cross builds during the release process should ideally run in parallel #1646

justaugustus commented Oct 16, 2020

tpepper commented Oct 16, 2020

BenTheElder commented Oct 16, 2020 via email

fejta-bot commented Jan 14, 2021

hasheddan commented Jan 14, 2021

puerco commented Jan 14, 2021

puerco commented Mar 4, 2021

spiffxp commented Mar 5, 2021

LappleApple commented Apr 21, 2021

xmudrii commented Apr 21, 2021

xmudrii commented Apr 22, 2021

fejta-bot commented Jul 21, 2021

spiffxp commented Jul 21, 2021

k8s-ci-robot commented Jul 21, 2021

Cross builds during the release process should ideally run in parallel #1646

Cross builds during the release process should ideally run in parallel #1646

Comments

justaugustus commented Oct 16, 2020

What would you like to be added:

Why is this needed:

tpepper commented Oct 16, 2020

BenTheElder commented Oct 16, 2020 via email

fejta-bot commented Jan 14, 2021

hasheddan commented Jan 14, 2021

puerco commented Jan 14, 2021

puerco commented Mar 4, 2021

spiffxp commented Mar 5, 2021

LappleApple commented Apr 21, 2021

xmudrii commented Apr 21, 2021

xmudrii commented Apr 22, 2021

fejta-bot commented Jul 21, 2021

spiffxp commented Jul 21, 2021

k8s-ci-robot commented Jul 21, 2021