openshift/os: increase requested resources #29031

miabbott · 2022-05-31T16:49:25Z

The ostree operations that happen as part of the cosa-build image
build are incredibly memory hungry during the ostree commit and
ostree container operations. They are moving upwards of 2G of data
into memory and onto disk and vice-versa.

The original resource requests were insufficient, causing the CI jobs
to be incredibly slow and sometimes even timing out completely. It's
been observed that the ostree container encapsulate operation ends
up requesting nearly 6Gi of memory.

This bumps both the memory requests and the CPU requests for the image
builds. It should give the jobs some healthy head room to perform the
operations at a reasonable pace.

The `ostree` operations that happen as part of the `cosa-build` image build are incredibly memory hungry during the `ostree commit` and `ostree container` operations. They are moving upwards of 2G of data into memory and onto disk and vice-versa. The original resource requests were insufficient, causing the CI jobs to be incredibly slow and sometimes even timing out completely. It's been observed that the `ostree container encapsulate` operation ends up requesting nearly 6Gi of memory. This bumps both the memory requests and the CPU requests for the image builds. It should give the jobs some healthy head room to perform the operations at a reasonable pace.

miabbott · 2022-05-31T16:49:53Z

Evidence of the extra memory request -

openshift-ci · 2022-05-31T16:50:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: miabbott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift/os/OWNERS~~ [miabbott]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

miabbott · 2022-05-31T16:51:01Z

I could be convinced of bumping the CPU request back down to 2000m but I think the extra cycles will benefit the speed of the jobs.

miabbott · 2022-05-31T17:13:21Z

Memory observed to actually go over 8Gi during the final container commit operation of the image build!

cgwalters · 2022-05-31T19:30:58Z

It's been observed that the ostree container encapsulate operation ends up requesting nearly 6Gi of memory.

Hmm...that must be a bug somewhere. I'm not immediately seeing large heap usage here, peaking at just 3.4MB.

miabbott · 2022-05-31T19:46:29Z

It's been observed that the ostree container encapsulate operation ends up requesting nearly 6Gi of memory.

Hmm...that must be a bug somewhere. I'm not immediately seeing large heap usage here, peaking at just 3.4MB.

To be fair, I was loosely correlating that operation with what I was seeing in the metrics dashboard, so there could be a misalignment.

On this topic of resource usage, I've not been able to reproduce the drastic slowness when spinning up a cosa pod on build02 and doing cosa build.

I'm beginning to think there are some special conditions applied to the cosa-build pod; I think it is being run as an OpenShift Build rather than a normally scheduled pod and I wonder if there are constraints there.

miabbott · 2022-05-31T20:31:31Z

Well, giving the cosa-build job more memory didn't seem to improve things:

�[36mINFO�[0m[2022-05-31T20:00:54Z] Ran for 3h7m6s

Open to new suggestions

miabbott · 2022-06-01T14:02:23Z

/retest

In openshift/release#29031 we are debugging very slow build times. Of the approximately 3h build time, 30 minutes is compressing all the files into the archive repo in `tmp/repo`. This is all essentially wasted time, because we now canonically represent the ostree commit as an ociarchive, which is re-compressed again differently. Eventually, we should drop `tmp/repo` and have `cache/repo-build` be the canonical uncompressed cache. In the short term though, ostree makes it easy to turn down the zlib compression level, which can have a dramatic impact here. Locally on my desktop: Before: ``` $ time sudo ostree --repo=tmp/repo pull-local cache/repo-build/ 988a1ffb47df4dda08df4d97d8e5f39f34c624d5c54b9c870f696203011758ef 3009 metadata, 19604 content objects imported; 1.3 GB content written ________________________________________________________ Executed in 8.33 secs fish external usr time 44.23 secs 836.00 micros 44.23 secs sys time 3.95 secs 108.00 micros 3.95 secs ``` After: ``` $ time sudo ostree --repo=tmp/repo pull-local cache/repo-build/ 988a1ffb47df4dda08df4d97d8e5f39f34c624d5c54b9c870f696203011758ef 3009 metadata, 19604 content objects imported; 1.3 GB content written ________________________________________________________ Executed in 6.09 secs fish external usr time 21.94 secs 0.00 micros 21.94 secs sys time 4.34 secs 955.00 micros 4.34 secs ``` The wall clock time isn't hugely different, but that's because my desktop is a hyperthreaded, otherwise idle i9-9900k. The actual CPU time spent is notably lower. In the Prow cluster where we're contending for CPU on slower processors, and further we are limited by cpu shares, this should help.

miabbott · 2022-06-02T14:13:28Z

coreos/coreos-assembler#2888 landed; let's see if that improves things here

/retest

miabbott · 2022-06-02T23:47:52Z

/retest

openshift-ci · 2022-06-03T03:57:31Z

@miabbott: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/rehearse/openshift/os/master/test-in-cluster	`5c4211e`	link	unknown	`/test pj-rehearse`
ci/rehearse/openshift/os/master/test-qemu-kola	`5c4211e`	link	unknown	`/test pj-rehearse`
ci/prow/pj-rehearse	`5c4211e`	link	false	`/test pj-rehearse`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

miabbott · 2022-06-10T01:23:41Z

This isn't the fix we want; see openshift/os#839 and #29329

/close

openshift-ci · 2022-06-10T01:24:05Z

@miabbott: Closed this PR.

In response to this:

This isn't the fix we want; see openshift/os#839 and #29329

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from jmarrero and travier May 31, 2022 16:50

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2022

miabbott mentioned this pull request May 31, 2022

new periodic jobs are failing on a missing repo openshift/os#801

Closed

miabbott mentioned this pull request May 31, 2022

ci: fix prow-build.sh to fetch same repos openshift/os#802

Merged

cgwalters mentioned this pull request Jun 1, 2022

build: Lower zlib compression level for tmp/repo coreos/coreos-assembler#2888

Merged

openshift-ci bot closed this Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openshift/os: increase requested resources #29031

openshift/os: increase requested resources #29031

miabbott commented May 31, 2022

miabbott commented May 31, 2022

openshift-ci bot commented May 31, 2022

miabbott commented May 31, 2022

miabbott commented May 31, 2022

cgwalters commented May 31, 2022

miabbott commented May 31, 2022

miabbott commented May 31, 2022

miabbott commented Jun 1, 2022

miabbott commented Jun 2, 2022

miabbott commented Jun 2, 2022

openshift-ci bot commented Jun 3, 2022

miabbott commented Jun 10, 2022

openshift-ci bot commented Jun 10, 2022

openshift/os: increase requested resources #29031

openshift/os: increase requested resources #29031

Conversation

miabbott commented May 31, 2022

miabbott commented May 31, 2022

openshift-ci bot commented May 31, 2022

miabbott commented May 31, 2022

miabbott commented May 31, 2022

cgwalters commented May 31, 2022

miabbott commented May 31, 2022

miabbott commented May 31, 2022

miabbott commented Jun 1, 2022

miabbott commented Jun 2, 2022

miabbott commented Jun 2, 2022

openshift-ci bot commented Jun 3, 2022

miabbott commented Jun 10, 2022

openshift-ci bot commented Jun 10, 2022