Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] Fix setting RLIMIT_NOFILE #4239

Closed

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 2, 2024

Since Go 1.21 (https://go-review.googlesource.com/c/go/+/476097), Go runtime saves the original value of RLIMIT_NOFILE upon startup and uses the saved value in StartProcess, unless RLIMIT_NOFILE is not explicitly changed by calling syscall.Setrlimit.

Now, runc uses unix.Prlimit (rather than syscall.Setrlimit) to set RLIMIT_NOFILE, thus Golang runtime is not aware that it is changed, result in occasional reset of RLIMIT_NOFILE, reported in #4195.

Bumping x/sys/unix to v0.7.0 fixes this (via
https://go-review.googlesource.com/c/sys/+/476695).

Fixes: #4195.

PS this is similar to #3856 in main branch, except now we know we're fixing a real bug. Thus, it is only needed in release-1.1 branch.

@kolyshkin kolyshkin added this to the 1.1.13 milestone Apr 2, 2024
@kolyshkin
Copy link
Contributor Author

CI failures in cross-i386 and cirrus are expected and are being fixed in #4231.

@kolyshkin

This comment was marked as outdated.

lifubang and others added 3 commits April 2, 2024 12:00
Signed-off-by: lfbzhm <lifubang@acmcoder.com>
(cherry picked from commit a596a05)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since Go 1.21 (https://go-review.googlesource.com/c/go/+/476097), Go
runtime saves the original value of RLIMIT_NOFILE upon startup and uses
the saved value in StartProcess, unless RLIMIT_NOFILE is not explicitly
changed by calling syscall.Setrlimit.

Now, runc uses unix.Prlimit (rather than syscall.Setrlimit) to set
RLIMIT_NOFILE, thus Golang runtime is not aware that it is changed,
result in occasional reset of RLIMIT_NOFILE, reported in opencontainers#4195.

Bumping x/sys/unix to v0.7.0 fixes this (via
https://go-review.googlesource.com/c/sys/+/476695).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 2, 2024

CI failures in cross-i386 and cirrus are expected and are being fixed in #4231.

Cherry-picked those CI commits from there to have green CI here.

@kolyshkin kolyshkin marked this pull request as ready for review April 2, 2024 20:40
@kolyshkin
Copy link
Contributor Author

@jrivera-px @ls-ggg PTAL

@kolyshkin
Copy link
Contributor Author

I can't reproduce this on my Fedora 39 machine. Can repro on Ubuntu 20.04 (40 times out of 1 million attempts).

If you want to try binary runc bult from this PR, use a zip file from https://github.com/opencontainers/runc/actions/runs/8528034513/artifacts/1378752088

@kolyshkin
Copy link
Contributor Author

Can repro on Ubuntu 20.04 (40 times out of 1 million attempts).

...which btw took almost 4 hours. Now retrying with runc built from this PR.

@kolyshkin
Copy link
Contributor Author

OK I tested this in the same env I used to repro #4195. Not able to reproduce in 100000 execs.

@kolyshkin
Copy link
Contributor Author

I'd love to add a test case but it takes a few minutes to repro so it doesn't make sense.

Here's how the test would look like:

[kir@kir-tp1 runc]$ cat tests/integration/ulimit.bats 
#!/usr/bin/env bats

load helpers

function setup() {
	setup_busybox
}

function teardown() {
	teardown_bundle
}

@test "runc exec with RLIMIT_NOFILE" {
	update_config '	 .process.rlimits = [{"type": "RLIMIT_NOFILE", "hard": 10000, "soft": 10000}]'

	runc run -d --console-socket "$CONSOLE_SOCKET" nofile
	[ "$status" -eq 0 ]

	for ((i = 0; i < 1000; i++)); do
		runc exec nofile sh -c 'ulimit -n'
		echo "[$i] $output"
		[[ "${output}" == "10000" ]]
	done
}

@kolyshkin kolyshkin marked this pull request as draft April 3, 2024 01:32
@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 3, 2024

Alas i can still repro this with the supposedly fixed version. Using the same setup as in #4195 except the container is run with terminal: false and the last step looks like this:

i=0; while [ $i -lt 100000 ]; do
  LIM=$(sudo /home/kir/git/runc/runc exec bionic sh -c "ulimit -n");
  if [ $LIM -ne 65536 ]; then
    echo "WHOOPSIE (iter $i) got numfile $LIM";
    break;
  fi;
  ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i";
  let i++;
done

It takes 5-10 minutes to reproduce, here are a few examples:

[Tue 02 Apr 2024 06:07:48 PM PDT]          0
[Tue 02 Apr 2024 06:07:49 PM PDT]        100
....
[Tue 02 Apr 2024 06:15:43 PM PDT]      35100
[Tue 02 Apr 2024 06:15:44 PM PDT]      35200
WHOOPSIE (iter 35264) got numfile 1024
[Tue 02 Apr 2024 06:32:14 PM PDT]          0
[Tue 02 Apr 2024 06:32:15 PM PDT]        100
...
[Tue 02 Apr 2024 06:39:27 PM PDT]      31900
WHOOPSIE (iter 31916) got numfile 1024

@kolyshkin
Copy link
Contributor Author

This is not working :-\

@kolyshkin kolyshkin closed this Apr 3, 2024
@kolyshkin
Copy link
Contributor Author

This is not working :-\

and our main branch is buggy, too

@kolyshkin kolyshkin removed this from the 1.1.13 milestone Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants