Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

badboy · 2022-05-30T12:43:09Z

It seems that memory usage during compilation has increased between 1.60.0 and 1.61.0, enough to trigger an OOM kill on CircleCI when trying to build my project.

Running on a medium+ instance (3 vCPUs, 6 GB RAM).
Command used: cargo test --all --verbose -- --nocapture

  = note: collect2: fatal error: ld terminated with signal 9 [Killed]

Example run: https://app.circleci.com/pipelines/github/mozilla/glean/9430/workflows/8c32e02e-6915-47ee-a587-a0ec098d9afd/jobs/175430
Same on nightly (rustc 1.63.0-nightly (28b891916 2022-05-29))

The same task runs just fine with Rust 1.60.0
I don't yet have a smaller example to showcase this, nor exact memory measurments. I'm willing to explore this further if you let me know how to best record memory usage.

The text was updated successfully, but these errors were encountered:

Stargateur · 2022-05-31T22:27:59Z

try reduce parallel job cargo test --all --verbose -j 1 -- --nocapture

badboy · 2022-06-01T09:18:28Z

cargo test --all --verbose -j 1 -- --nocapture

That ultimately does make it succeed, but lengthens build time quite a bit

ethowitz · 2022-06-02T18:34:32Z

FWIW, I'm having this issue too. 1.61.0 seems to use substantially more memory during compilation (specifically, we are running into the issue with cargo clippy), and it's causing CI builds to time out. I'd be happy to provide more detailed information if there's a good way to measure memory usage. Using 1.60.0 solves the issue.

Stargateur · 2022-06-02T20:14:37Z

I think it's a insolvable problem, thus maybe we could at least be aware of memory increase consomption by rustc when we run crater for a new rustc release, with crater we could see rustc used more or less memory since last time, maybe... but in finite memory is always a hard problem, if you don't have enough memory to build your project the only solution is add memory, or reduce concurrent job, having cpu / 2 job for example. I don't think rustc will have enough contributor to guarantee stable memory usage, that a non goal I think.

@rustbot labels: +regression-from-stable-to-stable -regression-untriaged

(add label thus I'm not very used to this)

apiraino · 2022-06-06T12:21:43Z

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

Ping to WG-compiler-performance for possible insights/suggestions to diagnose the source of the regression

@rustbot label -I-prioritize +P-high +wg-compiler-performance

ehuss · 2022-06-07T18:30:32Z

@badboy Can you confirm that you have 3 CPUs? In the error message in your log, I see 9 process errors which implies that more than 3 processes were running concurrently. Usually, if Cargo is limited to building 3 jobs in parallel, and one of them fails, it will halt immediately and not try any more. 1.61 changed to use the standard library available_parallelism, which might behaving differently in your setup. There have also been some changes to how it is computed (#92697).

I'm not sure what the best way is to check. Perhaps you can compile a small program and run it to test:

fn main() {
    println!("{}", std::thread::available_parallelism().unwrap());
}

Or if you want to get cargo to tell you directly, the only way I can think of is to create an empty project and get the timing data:

cargo new foo
cd foo
cargo b --timings
grep ncpu target/cargo-timings/cargo-timing.html

badboy · 2022-06-07T18:46:21Z

circleci@e6625592d7bc:~$ rustc cores.rs
circleci@e6625592d7bc:~$ ./cores
36
[...]
circleci@e6625592d7bc:~/foo$ grep ncpu target/cargo-timings/cargo-timing.html
    <td>Max concurrency:</td><td>1 (jobs=36 ncpu=36)</td>
circleci@e6625592d7bc:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:          70226       12491        5640         180       52095       56789

So: 36 cores visible, 70G of memory visible.
But this is a CircleCI medium+ instance, so capped at 3 vCPUs / 6G RAM, according to the docs.
I don't know the details how they implement it. I would assume 3 vCPUs means something like 300% of CPU time, shared across the whole host machine and the 6G being a cap for the process group/container/whatever-it-uses.

Now on that same machine available_parallelism gives me still 36 with Rust 1.60.0,
but the cargo invocation with 1.60.0 sees jobs=6 ncpu=6.
So that would definitely explain why it keeps within limits on that older version.

ehuss · 2022-06-07T20:16:32Z

OK. My suspicion is that CircleCI's Docker uses cgroups v1 (as does my local system), and #92697 doesn't help.

@the8472 is it feasible to support cgroups v1 in available_parallelism?

If not, I'm inclined to revert rust-lang/cargo#10427. cc @joshtriplett @weihanglo

the8472 · 2022-06-07T20:35:15Z

Should be feasible, yes. But taking a look at the num_cpus implementation it could end up being more complex because cgroupv1 has a more configurable directory structure that might require scanning /proc/mountinfo to find the place where cpu controller is mounted, which is something @joshtriplett wanted to avoid for performance reasons.

nnethercote · 2022-06-07T21:13:46Z

The comments above make it sound like the memory problems are not the compiler's fault per se, but rather changes in how many compiler processes Cargo is spawning.

https://perf.rust-lang.org/compare.html?start=2022-01-01&end=2022-06-06&stat=max-rss is a quick sanity check, comparing the max-rss measurement for the development compiler from Jan 1st (definitely before the 1.61 cutoff) and June 6th. Between those dates we upgraded most of the "primary" benchmarks so we don't have many results in that section, but we do have helloworld and lots of secondary benchmarks, which should be enough to give a rough idea. It shows that 201 benchmark runs saw max-rss reductions, some quite large, while only 2 runs saw increases.

So this is pretty strong evidence that the compiler's memory usage has only gone down lately, adding weight to the theory that it's all about how the compiler is being invoked by Cargo.

(BTW, I'm not dismissing the significance of this issue, just clarifying the likely cause.)

joshtriplett · 2022-06-07T21:27:43Z

Should be feasible, yes. But taking a look at the num_cpus implementation it could end up being more complex because cgroupv1 has a more configurable directory structure that might require scanning /proc/mountinfo to find the place where cpu controller is mounted, which is something @joshtriplett wanted to avoid for performance reasons.

Supporting cgroupv1 should still be possible without scanning /proc/mountinfo, by looking in the standard location where it's generally mounted. Could you post the output of mount on CircleCI?

nnethercote · 2022-06-08T02:03:26Z

I don't think rustc will have enough contributor to guarantee stable memory usage, that a non goal I think.

FWIW, people like me who work on compiler performance certainly do care about memory usage. It's tracked continously and triaged weekly, and we're always happy to hear about regressions.

badboy · 2022-06-08T07:57:55Z

circleci@9d155efdb9dc:~$ mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/165536.165536/overlay2/l/7BDLL6TJLU65MROAM6ORMKN2QJ:/var/lib/docker/165536.165536/overlay2/l/ODCADVC73KP74GXIYJ6QQESDWV:/var/lib/docker/165536.165536/overlay2/l/XNYNHFBRI5XEHU4QQ76CK6O6NA:/var/lib/docker/165536.165536/overlay2/l/TG2CMYBKKLFAS7ABHB7NXBCQ5U:/var/lib/docker/165536.165536/overlay2/l/V4W52AYHVNRX73J6PBUYAHRKNB:/var/lib/docker/165536.165536/overlay2/l/MKNUNELDPK4RAM6NJMS5GETW5U:/var/lib/docker/165536.165536/overlay2/l/JYQMW7U4UE7AHZDKXNYJ6KZBJB:/var/lib/docker/165536.165536/overlay2/l/F3SJSAZ7N4V4XNAIUF4KMKO55S,upperdir=/var/lib/docker/165536.165536/overlay2/d50940bc69d8f7e18d62217d215702ae784596064bde27ecfe14cca87e87aa22/diff,workdir=/var/lib/docker/165536.165536/overlay2/d50940bc69d8f7e18d62217d215702ae784596064bde27ecfe14cca87e87aa22/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755,uid=165536,gid=165536,inode64)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,misc)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
/dev/root on /usr/sbin/docker-init type ext4 (ro,relatime,discard)
/dev/nvme1n1 on /etc/resolv.conf type xfs (rw,relatime,attr2,discard,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/nvme1n1 on /etc/hostname type xfs (rw,relatime,attr2,discard,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/nvme1n1 on /etc/hosts type xfs (rw,relatime,attr2,discard,inode64,logbufs=8,logbsize=32k,prjquota)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=50331648k,inode64)
tmpfs on /mnt/ramdisk type tmpfs (rw,nodev,relatime,size=50331648k,uid=165536,gid=165536,inode64)
devtmpfs on /dev/null type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/random type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/full type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/tty type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/zero type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/urandom type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=165536,gid=165536,inode64)
devtmpfs on /proc/kcore type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /proc/keys type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /proc/timer_list type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
tmpfs on /proc/scsi type tmpfs (ro,relatime,uid=165536,gid=165536,inode64)
tmpfs on /sys/firmware type tmpfs (ro,relatime,uid=165536,gid=165536,inode64)

badboy · 2022-06-08T08:10:49Z

Explicitly limiting cargo to 6 jobs (--jobs 6) seems to give us back about the same run time, while not running into memory or cpu exhaustion, so for now I'll probably go with that.

Recent Rust/Cargo changed detection of available cores, which is probably the cause of earlier CPU/memory exhaustion kills. By manually limiting it to 6 jobs we stay within the limits, while keeping about the same runtime. Upstream bug: rust-lang/rust#97549

the8472 · 2022-06-08T10:05:40Z

Supporting cgroupv1 should still be possible without scanning /proc/mountinfo, by looking in the standard location where it's generally mounted. Could you post the output of mount on CircleCI?

But hat's what I mean, there is not one standard location. The manpage says

 Thus, one might mount the cpu controller as follows:

       mount -t cgroup -o cpu none /sys/fs/cgroup/cpu

 It is possible to comount multiple controllers against the same hierarchy.  For example, here the cpu and cpuacct controllers are comounted against a single hierarchy:

       mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct

Those are examples and cgroupv1 is meant to be flexible that one can arbitrarily combine the different controller hierarchies or given them different names.

We could try a fixed list but then wouldn't we have to survey distros what they use as their default cgroup v1 setup? A single sample doesn't seem sufficient.

joshtriplett · 2022-06-08T18:26:07Z

We talked about this in today's @rust-lang/libs meeting. This isn't a libs regression (since libs never supported cgroupv1); it's a cargo regression.

Three possibilities:

We could consider --jobs (or other means of limiting the CPUs cargo uses) a sufficient workaround, and close this.
Cargo could (temporarily or permanently) switch back to num_cpus.
We could add cgroupv1 support in libs.

joshtriplett · 2022-06-08T18:27:48Z

Regarding groupv1 support in libs: Since this is not the common case, and on most systems processes aren't in a cgroupv1, I think we should try to minimize the amount of time this takes, even if that means it's best-effort-only. I don't think we should make all the systems where processes aren't in a cgroup go through a (potentially very expensive) mountinfo scan.

the8472 · 2022-06-08T19:00:28Z

I don't think we should make all the systems where processes aren't in a cgroup go through a (potentially very expensive) mountinfo scan.

We can check /proc/self/cgroup first and if the process is in the root cgroup or there is no cgroup support compiled into the kernel then we don't need to scan mountinfo either. We could check cgroupv2 first, then fixed cgroupv1 list and then fallback to mountinfo scan.
So we could have a thorough and fast for most configurations implementation, at the price of complexity.

But yeah, I guess I'll start with the fixed list and if someone still complains about cgroupv1 later with some non-standard configuration that can be another PR.

ehuss · 2022-06-08T21:10:47Z

I'm not sure I agree that most systems are using cgroups v2. My own system does not, and support has only recently been added. I posted rust-lang/cargo#10737 with a revert on beta. If we can get (however basic) cgroups v1 support in 1.63, that would make me feel a lot more comfortable with using the std implementation.

[beta] Revert #10427: switch from num_cpus This temporarily reverts #10427 (Use available_parallelism instead of num_cpus) per the discussion at rust-lang/rust#97549. `available_parallelism` does not handle cgroups v1 on Linux unlike num_cpus. I am concerned that this potentially affects a significant percentage of users. For example, Docker just added cgroups v2 support last year. Various Linux distributions have only recently switched to it as the default. The following is what I can find on the web: * Fedora (since 31) * Arch Linux (since April 2021) * openSUSE Tumbleweed (since c. 2021) * Debian GNU/Linux (since 11) * Ubuntu (since 21.10) * RHEL and RHEL-like distributions (since 9) This also appears to affect CircleCI. The consequence is that Cargo ends up using too much parallelism and can run out of memory. I'm not sure what to do about 1.63. If std adds support for cgroups v1, then I don't think there is anything to do there. Otherwise I think we should revert similarly if that doesn't happen.

dtolnay · 2022-06-09T04:55:27Z

I am seeing the same symptom (builds failing due to enormously increased memory usage in 1.61.0) but in a non-Cargo environment. I am still investigating but I notice that rustc calls available_parallelism() too, not just Cargo. Cargo's use has been reverted to num_cpus in rust-lang/cargo#10737 but it's possible that #94524 will also need to be reverted.

the8472 · 2022-06-09T19:05:06Z

cgroupv1 PR: #97925

Recent Rust/Cargo changed detection of available cores, which is probably the cause of earlier CPU/memory exhaustion kills. By manually limiting it to 6 jobs we stay within the limits, while keeping about the same runtime. Upstream bug: rust-lang/rust#97549

Revert "remove num_cpus dependency" in rustc and update cargo Fixes rust-lang#97549. This PR reverts rust-lang#94524 and does a Cargo update to pull in rust-lang/cargo#10737. Rust 1.61.0 has a regression in which it misidentifies the number of available CPUs in some environments, leading to enormously increased memory usage and failing builds. In between Rust 1.60 and 1.61 both rustc and cargo replaced some uses of `num_cpus` with `available_parallelism`, which eliminated support for cgroupv1, still apparently in common use. This PR switches both rustc and cargo back to using `num_cpus` in order to support environments where the available parallelism is controlled by cgroupv1. Both can use `available_parallism` again once it handles cgroupv1 (if ever). I have confirmed that the rustc part of this PR fixes the memory usage regression in my non-Cargo environment, and others have confirmed in rust-lang#97549 that the Cargo regression was at fault for the memory usage regression in their environments.

Add cgroupv1 support to available_parallelism Fixes rust-lang#97549 My dev machine uses cgroup v2 so I was only able to test that code path. So the v1 code path is written only based on documentation. I could use some help testing that it works on a machine with cgroups v1: ``` $ x.py build --stage 1 # quota.rs fn main() { println!("{:?}", std::thread::available_parallelism()); } # assuming stage1 is linked in rustup $ rust +stage1 quota.rs # spawn a new cgroup scope for the current user $ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS # should print Ok(3) $ ./quota ``` If it doesn't work as expected an strace, the contents of `/proc/self/cgroups` and the structure of `/sys/fs/cgroups` would help.

the8472 · 2022-07-23T18:41:08Z

#97925 landed, so next nightly will have support for cgroupv1 quotas too.

…=jyn514 Reland changes replacing num_cpus with available_parallelism Since rust-lang#97925 added cgroupv1 support the problem in rust-lang#97549 which lead to the previous revert should be addressed now. Cargo has reapplied the replacement too rust-lang/cargo#10969 Reverts 1ae4b25 (part of rust-lang#97911) Relands rust-lang#94524

ethowitz mentioned this issue Jun 2, 2022

chore: pin Rust 1.60.0 mozilla-services/syncstorage-rs#1326

Merged

inquisitivecrystal added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. labels Jun 2, 2022

rustbot added P-high High priority WG-compiler-performance Working group: Compiler Performance and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 6, 2022

the8472 self-assigned this Jun 8, 2022

ehuss mentioned this issue Jun 8, 2022

[beta] Revert #10427: switch from num_cpus rust-lang/cargo#10737

Merged

cuviper mentioned this issue Jun 9, 2022

Use std:: thread:: available_parallelism() instead of num_cpus dependency rayon-rs/rayon#937

Merged

dtolnay mentioned this issue Jun 9, 2022

Revert "remove num_cpus dependency" in rustc and update cargo #97911

Merged

the8472 mentioned this issue Jun 9, 2022

Add cgroupv1 support to available_parallelism #97925

Merged

bors closed this as completed in 420c970 Jun 9, 2022

pnkfelix mentioned this issue Jun 16, 2022

compile-time regression from switch from num_cpus to available_parallelism needs a regression test #98168

Open

the8472 mentioned this issue Aug 15, 2022

Reland changes replacing num_cpus with available_parallelism #100586

Merged

mhammond mentioned this issue Aug 24, 2022

Update Rust to 1.63, MSR to 1.61.0 mozilla/uniffi-rs#1314

Merged

This was referenced Dec 11, 2023

Bump to 1.65 tokio-rs/tokio#6210

Closed

Use std::thread::available_parallelism() instead of num_cpus tokio-rs/tokio#6211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

badboy commented May 30, 2022

Stargateur commented May 31, 2022

badboy commented Jun 1, 2022

ethowitz commented Jun 2, 2022

Stargateur commented Jun 2, 2022 •

edited

apiraino commented Jun 6, 2022 •

edited

ehuss commented Jun 7, 2022

badboy commented Jun 7, 2022

ehuss commented Jun 7, 2022

the8472 commented Jun 7, 2022

nnethercote commented Jun 7, 2022 •

edited

joshtriplett commented Jun 7, 2022 •

edited

nnethercote commented Jun 8, 2022

badboy commented Jun 8, 2022

badboy commented Jun 8, 2022

the8472 commented Jun 8, 2022

joshtriplett commented Jun 8, 2022

joshtriplett commented Jun 8, 2022 •

edited

the8472 commented Jun 8, 2022

ehuss commented Jun 8, 2022

dtolnay commented Jun 9, 2022

the8472 commented Jun 9, 2022

the8472 commented Jul 23, 2022 •

edited

Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

Comments

badboy commented May 30, 2022

Stargateur commented May 31, 2022

badboy commented Jun 1, 2022

ethowitz commented Jun 2, 2022

Stargateur commented Jun 2, 2022 • edited

apiraino commented Jun 6, 2022 • edited

ehuss commented Jun 7, 2022

badboy commented Jun 7, 2022

ehuss commented Jun 7, 2022

the8472 commented Jun 7, 2022

nnethercote commented Jun 7, 2022 • edited

joshtriplett commented Jun 7, 2022 • edited

nnethercote commented Jun 8, 2022

badboy commented Jun 8, 2022

badboy commented Jun 8, 2022

the8472 commented Jun 8, 2022

joshtriplett commented Jun 8, 2022

joshtriplett commented Jun 8, 2022 • edited

the8472 commented Jun 8, 2022

ehuss commented Jun 8, 2022

dtolnay commented Jun 9, 2022

the8472 commented Jun 9, 2022

the8472 commented Jul 23, 2022 • edited

Stargateur commented Jun 2, 2022 •

edited

apiraino commented Jun 6, 2022 •

edited

nnethercote commented Jun 7, 2022 •

edited

joshtriplett commented Jun 7, 2022 •

edited

joshtriplett commented Jun 8, 2022 •

edited

the8472 commented Jul 23, 2022 •

edited