Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

Closed
badboy opened this issue May 30, 2022 · 22 comments · Fixed by #97911 or #97925
Closed

Higher memory usage on Rust 1.61.0 (compared to 1.60.0) leading to SIGKILL #97549

badboy opened this issue May 30, 2022 · 22 comments · Fixed by #97911 or #97925
Assignees
Labels
I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@badboy
Copy link
Member

badboy commented May 30, 2022

It seems that memory usage during compilation has increased between 1.60.0 and 1.61.0, enough to trigger an OOM kill on CircleCI when trying to build my project.

Running on a medium+ instance (3 vCPUs, 6 GB RAM).
Command used: cargo test --all --verbose -- --nocapture

  = note: collect2: fatal error: ld terminated with signal 9 [Killed]

Example run: https://app.circleci.com/pipelines/github/mozilla/glean/9430/workflows/8c32e02e-6915-47ee-a587-a0ec098d9afd/jobs/175430
Same on nightly (rustc 1.63.0-nightly (28b891916 2022-05-29))

The same task runs just fine with Rust 1.60.0
I don't yet have a smaller example to showcase this, nor exact memory measurments. I'm willing to explore this further if you let me know how to best record memory usage.

@Stargateur
Copy link
Contributor

try reduce parallel job cargo test --all --verbose -j 1 -- --nocapture

@badboy
Copy link
Member Author

badboy commented Jun 1, 2022

cargo test --all --verbose -j 1 -- --nocapture

That ultimately does make it succeed, but lengthens build time quite a bit

@ethowitz
Copy link

ethowitz commented Jun 2, 2022

FWIW, I'm having this issue too. 1.61.0 seems to use substantially more memory during compilation (specifically, we are running into the issue with cargo clippy), and it's causing CI builds to time out. I'd be happy to provide more detailed information if there's a good way to measure memory usage. Using 1.60.0 solves the issue.

@Stargateur
Copy link
Contributor

Stargateur commented Jun 2, 2022

I think it's a insolvable problem, thus maybe we could at least be aware of memory increase consomption by rustc when we run crater for a new rustc release, with crater we could see rustc used more or less memory since last time, maybe... but in finite memory is always a hard problem, if you don't have enough memory to build your project the only solution is add memory, or reduce concurrent job, having cpu / 2 job for example. I don't think rustc will have enough contributor to guarantee stable memory usage, that a non goal I think.

@rustbot labels: +regression-from-stable-to-stable -regression-untriaged

(add label thus I'm not very used to this)

@rustbot rustbot added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. and removed regression-untriaged Untriaged performance or correctness regression. labels Jun 2, 2022
@inquisitivecrystal inquisitivecrystal added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. labels Jun 2, 2022
@apiraino
Copy link
Contributor

apiraino commented Jun 6, 2022

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

Ping to WG-compiler-performance for possible insights/suggestions to diagnose the source of the regression

@rustbot label -I-prioritize +P-high +wg-compiler-performance

@rustbot rustbot added P-high High priority WG-compiler-performance Working group: Compiler Performance and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 6, 2022
@ehuss
Copy link
Contributor

ehuss commented Jun 7, 2022

@badboy Can you confirm that you have 3 CPUs? In the error message in your log, I see 9 process errors which implies that more than 3 processes were running concurrently. Usually, if Cargo is limited to building 3 jobs in parallel, and one of them fails, it will halt immediately and not try any more. 1.61 changed to use the standard library available_parallelism, which might behaving differently in your setup. There have also been some changes to how it is computed (#92697).

I'm not sure what the best way is to check. Perhaps you can compile a small program and run it to test:

fn main() {
    println!("{}", std::thread::available_parallelism().unwrap());
}

Or if you want to get cargo to tell you directly, the only way I can think of is to create an empty project and get the timing data:

cargo new foo
cd foo
cargo b --timings
grep ncpu target/cargo-timings/cargo-timing.html

@badboy
Copy link
Member Author

badboy commented Jun 7, 2022

circleci@e6625592d7bc:~$ rustc cores.rs
circleci@e6625592d7bc:~$ ./cores
36
[...]
circleci@e6625592d7bc:~/foo$ grep ncpu target/cargo-timings/cargo-timing.html
    <td>Max concurrency:</td><td>1 (jobs=36 ncpu=36)</td>
circleci@e6625592d7bc:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:          70226       12491        5640         180       52095       56789

So: 36 cores visible, 70G of memory visible.
But this is a CircleCI medium+ instance, so capped at 3 vCPUs / 6G RAM, according to the docs.
I don't know the details how they implement it. I would assume 3 vCPUs means something like 300% of CPU time, shared across the whole host machine and the 6G being a cap for the process group/container/whatever-it-uses.

Now on that same machine available_parallelism gives me still 36 with Rust 1.60.0,
but the cargo invocation with 1.60.0 sees jobs=6 ncpu=6.
So that would definitely explain why it keeps within limits on that older version.

@ehuss
Copy link
Contributor

ehuss commented Jun 7, 2022

OK. My suspicion is that CircleCI's Docker uses cgroups v1 (as does my local system), and #92697 doesn't help.

@the8472 is it feasible to support cgroups v1 in available_parallelism?

If not, I'm inclined to revert rust-lang/cargo#10427. cc @joshtriplett @weihanglo

@the8472
Copy link
Member

the8472 commented Jun 7, 2022

Should be feasible, yes. But taking a look at the num_cpus implementation it could end up being more complex because cgroupv1 has a more configurable directory structure that might require scanning /proc/mountinfo to find the place where cpu controller is mounted, which is something @joshtriplett wanted to avoid for performance reasons.

@nnethercote
Copy link
Contributor

nnethercote commented Jun 7, 2022

The comments above make it sound like the memory problems are not the compiler's fault per se, but rather changes in how many compiler processes Cargo is spawning.

https://perf.rust-lang.org/compare.html?start=2022-01-01&end=2022-06-06&stat=max-rss is a quick sanity check, comparing the max-rss measurement for the development compiler from Jan 1st (definitely before the 1.61 cutoff) and June 6th. Between those dates we upgraded most of the "primary" benchmarks so we don't have many results in that section, but we do have helloworld and lots of secondary benchmarks, which should be enough to give a rough idea. It shows that 201 benchmark runs saw max-rss reductions, some quite large, while only 2 runs saw increases.

So this is pretty strong evidence that the compiler's memory usage has only gone down lately, adding weight to the theory that it's all about how the compiler is being invoked by Cargo.

(BTW, I'm not dismissing the significance of this issue, just clarifying the likely cause.)

@joshtriplett
Copy link
Member

joshtriplett commented Jun 7, 2022

Should be feasible, yes. But taking a look at the num_cpus implementation it could end up being more complex because cgroupv1 has a more configurable directory structure that might require scanning /proc/mountinfo to find the place where cpu controller is mounted, which is something @joshtriplett wanted to avoid for performance reasons.

Supporting cgroupv1 should still be possible without scanning /proc/mountinfo, by looking in the standard location where it's generally mounted. Could you post the output of mount on CircleCI?

@inquisitivecrystal inquisitivecrystal added T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance labels Jun 8, 2022
@nnethercote
Copy link
Contributor

I don't think rustc will have enough contributor to guarantee stable memory usage, that a non goal I think.

FWIW, people like me who work on compiler performance certainly do care about memory usage. It's tracked continously and triaged weekly, and we're always happy to hear about regressions.

@badboy
Copy link
Member Author

badboy commented Jun 8, 2022

circleci@9d155efdb9dc:~$ mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/165536.165536/overlay2/l/7BDLL6TJLU65MROAM6ORMKN2QJ:/var/lib/docker/165536.165536/overlay2/l/ODCADVC73KP74GXIYJ6QQESDWV:/var/lib/docker/165536.165536/overlay2/l/XNYNHFBRI5XEHU4QQ76CK6O6NA:/var/lib/docker/165536.165536/overlay2/l/TG2CMYBKKLFAS7ABHB7NXBCQ5U:/var/lib/docker/165536.165536/overlay2/l/V4W52AYHVNRX73J6PBUYAHRKNB:/var/lib/docker/165536.165536/overlay2/l/MKNUNELDPK4RAM6NJMS5GETW5U:/var/lib/docker/165536.165536/overlay2/l/JYQMW7U4UE7AHZDKXNYJ6KZBJB:/var/lib/docker/165536.165536/overlay2/l/F3SJSAZ7N4V4XNAIUF4KMKO55S,upperdir=/var/lib/docker/165536.165536/overlay2/d50940bc69d8f7e18d62217d215702ae784596064bde27ecfe14cca87e87aa22/diff,workdir=/var/lib/docker/165536.165536/overlay2/d50940bc69d8f7e18d62217d215702ae784596064bde27ecfe14cca87e87aa22/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755,uid=165536,gid=165536,inode64)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,misc)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
/dev/root on /usr/sbin/docker-init type ext4 (ro,relatime,discard)
/dev/nvme1n1 on /etc/resolv.conf type xfs (rw,relatime,attr2,discard,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/nvme1n1 on /etc/hostname type xfs (rw,relatime,attr2,discard,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/nvme1n1 on /etc/hosts type xfs (rw,relatime,attr2,discard,inode64,logbufs=8,logbsize=32k,prjquota)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=50331648k,inode64)
tmpfs on /mnt/ramdisk type tmpfs (rw,nodev,relatime,size=50331648k,uid=165536,gid=165536,inode64)
devtmpfs on /dev/null type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/random type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/full type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/tty type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/zero type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /dev/urandom type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=165536,gid=165536,inode64)
devtmpfs on /proc/kcore type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /proc/keys type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
devtmpfs on /proc/timer_list type devtmpfs (rw,relatime,size=35949588k,nr_inodes=8987397,mode=755,inode64)
tmpfs on /proc/scsi type tmpfs (ro,relatime,uid=165536,gid=165536,inode64)
tmpfs on /sys/firmware type tmpfs (ro,relatime,uid=165536,gid=165536,inode64)

@badboy
Copy link
Member Author

badboy commented Jun 8, 2022

Explicitly limiting cargo to 6 jobs (--jobs 6) seems to give us back about the same run time, while not running into memory or cpu exhaustion, so for now I'll probably go with that.

badboy added a commit to mozilla/glean that referenced this issue Jun 8, 2022
Recent Rust/Cargo changed detection of available cores,
which is probably the cause of earlier CPU/memory exhaustion kills.
By manually limiting it to 6 jobs we stay within the limits, while
keeping about the same runtime.

Upstream bug: rust-lang/rust#97549
@the8472
Copy link
Member

the8472 commented Jun 8, 2022

Supporting cgroupv1 should still be possible without scanning /proc/mountinfo, by looking in the standard location where it's generally mounted. Could you post the output of mount on CircleCI?

But hat's what I mean, there is not one standard location. The manpage says

 Thus, one might mount the cpu controller as follows:

       mount -t cgroup -o cpu none /sys/fs/cgroup/cpu

 It is possible to comount multiple controllers against the same hierarchy.  For example, here the cpu and cpuacct controllers are comounted against a single hierarchy:

       mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct

Those are examples and cgroupv1 is meant to be flexible that one can arbitrarily combine the different controller hierarchies or given them different names.

We could try a fixed list but then wouldn't we have to survey distros what they use as their default cgroup v1 setup? A single sample doesn't seem sufficient.

@joshtriplett
Copy link
Member

We talked about this in today's @rust-lang/libs meeting. This isn't a libs regression (since libs never supported cgroupv1); it's a cargo regression.

Three possibilities:

  • We could consider --jobs (or other means of limiting the CPUs cargo uses) a sufficient workaround, and close this.
  • Cargo could (temporarily or permanently) switch back to num_cpus.
  • We could add cgroupv1 support in libs.

@joshtriplett
Copy link
Member

joshtriplett commented Jun 8, 2022

Regarding groupv1 support in libs: Since this is not the common case, and on most systems processes aren't in a cgroupv1, I think we should try to minimize the amount of time this takes, even if that means it's best-effort-only. I don't think we should make all the systems where processes aren't in a cgroup go through a (potentially very expensive) mountinfo scan.

@the8472
Copy link
Member

the8472 commented Jun 8, 2022

I don't think we should make all the systems where processes aren't in a cgroup go through a (potentially very expensive) mountinfo scan.

We can check /proc/self/cgroup first and if the process is in the root cgroup or there is no cgroup support compiled into the kernel then we don't need to scan mountinfo either. We could check cgroupv2 first, then fixed cgroupv1 list and then fallback to mountinfo scan.
So we could have a thorough and fast for most configurations implementation, at the price of complexity.

But yeah, I guess I'll start with the fixed list and if someone still complains about cgroupv1 later with some non-standard configuration that can be another PR.

@ehuss
Copy link
Contributor

ehuss commented Jun 8, 2022

I'm not sure I agree that most systems are using cgroups v2. My own system does not, and support has only recently been added. I posted rust-lang/cargo#10737 with a revert on beta. If we can get (however basic) cgroups v1 support in 1.63, that would make me feel a lot more comfortable with using the std implementation.

bors added a commit to rust-lang/cargo that referenced this issue Jun 8, 2022
[beta] Revert #10427: switch from num_cpus

This temporarily reverts #10427 (Use available_parallelism instead of num_cpus) per the discussion at rust-lang/rust#97549. `available_parallelism` does not handle cgroups v1 on Linux unlike num_cpus. I am concerned that this potentially affects a significant percentage of users. For example, Docker just added cgroups v2 support last year. Various Linux distributions have only recently switched to it as the default. The following is what I can find on the web:

* Fedora (since 31)
* Arch Linux (since April 2021)
* openSUSE Tumbleweed (since c. 2021)
* Debian GNU/Linux (since 11)
* Ubuntu (since 21.10)
* RHEL and RHEL-like distributions (since 9)

This also appears to affect CircleCI.

The consequence is that Cargo ends up using too much parallelism and can run out of memory.

I'm not sure what to do about 1.63.  If std adds support for cgroups v1, then I don't think there is anything to do there. Otherwise I think we should revert similarly if that doesn't happen.
@dtolnay
Copy link
Member

dtolnay commented Jun 9, 2022

I am seeing the same symptom (builds failing due to enormously increased memory usage in 1.61.0) but in a non-Cargo environment. I am still investigating but I notice that rustc calls available_parallelism() too, not just Cargo. Cargo's use has been reverted to num_cpus in rust-lang/cargo#10737 but it's possible that #94524 will also need to be reverted.

@the8472
Copy link
Member

the8472 commented Jun 9, 2022

cgroupv1 PR: #97925

@bors bors closed this as completed in 420c970 Jun 9, 2022
badboy added a commit to mozilla/glean that referenced this issue Jun 13, 2022
Recent Rust/Cargo changed detection of available cores,
which is probably the cause of earlier CPU/memory exhaustion kills.
By manually limiting it to 6 jobs we stay within the limits, while
keeping about the same runtime.

Upstream bug: rust-lang/rust#97549
ehuss pushed a commit to ehuss/rust that referenced this issue Jun 24, 2022
Revert "remove num_cpus dependency" in rustc and update cargo

Fixes rust-lang#97549. This PR reverts rust-lang#94524 and does a Cargo update to pull in rust-lang/cargo#10737.

Rust 1.61.0 has a regression in which it misidentifies the number of available CPUs in some environments, leading to enormously increased memory usage and failing builds. In between Rust 1.60 and 1.61 both rustc and cargo replaced some uses of `num_cpus` with `available_parallelism`, which eliminated support for cgroupv1, still apparently in common use. This PR switches both rustc and cargo back to using `num_cpus` in order to support environments where the available parallelism is controlled by cgroupv1. Both can use `available_parallism` again once it handles cgroupv1 (if ever).

I have confirmed that the rustc part of this PR fixes the memory usage regression in my non-Cargo environment, and others have confirmed in rust-lang#97549 that the Cargo regression was at fault for the memory usage regression in their environments.
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 23, 2022
Add cgroupv1 support to available_parallelism

Fixes rust-lang#97549

My dev machine uses cgroup v2 so I was only able to test that code path. So the v1 code path is written only based on documentation. I could use some help testing that it works on a machine with cgroups v1:

```
$ x.py build --stage 1

# quota.rs
fn main() {
    println!("{:?}", std::thread::available_parallelism());
}

# assuming stage1 is linked in rustup
$ rust +stage1 quota.rs

# spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS

# should print Ok(3)
$ ./quota
```

If it doesn't work as expected an strace, the contents of `/proc/self/cgroups` and the structure of `/sys/fs/cgroups` would help.
@the8472
Copy link
Member

the8472 commented Jul 23, 2022

#97925 landed, so next nightly will have support for cgroupv1 quotas too.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 15, 2022
…=jyn514

Reland changes replacing num_cpus with available_parallelism

Since rust-lang#97925 added cgroupv1 support the problem in rust-lang#97549 which lead to the previous revert should be addressed now.

Cargo has reapplied the replacement too rust-lang/cargo#10969

Reverts 1ae4b25 (part of rust-lang#97911)
Relands rust-lang#94524
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 15, 2022
…=jyn514

Reland changes replacing num_cpus with available_parallelism

Since rust-lang#97925 added cgroupv1 support the problem in rust-lang#97549 which lead to the previous revert should be addressed now.

Cargo has reapplied the replacement too rust-lang/cargo#10969

Reverts 1ae4b25 (part of rust-lang#97911)
Relands rust-lang#94524
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-cargo Relevant to the cargo team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet