Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: caching RLIMIT_NOFILE is wrong implementation #66797

Open
ls-ggg opened this issue Apr 12, 2024 · 24 comments
Open

syscall: caching RLIMIT_NOFILE is wrong implementation #66797

ls-ggg opened this issue Apr 12, 2024 · 24 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ls-ggg
Copy link

ls-ggg commented Apr 12, 2024

Go version

go version go1.20.13 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.13"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1954436029=/tmp/go-build -gno-record-gcc-switches"

What did you do?

RLIMIT_NOFILE is cached in the go runtime, and RLIMIT_NOFILE in the cache is restored for the child process before exec.

Assume that the system's default RLIMIT_NOFILE is 1024, and process A generates child process B through fork. B is a go application, so the go runtime will cache RLIMIT_NOFILE as 1024. When I set B's RLIMIT_NOFILE to 10000 through prlimit in A, and use exec.Command to create process C in B, C's RLIMIT_NOFILE is restored to 1024.

This behavior is incompatible with the operating system. It also caused problems with runc:
opencontainers/runc#4195

The problem originates from f5eef58

What did you see happen?

Reference opencontainers/runc#4195

What did you expect to see?

RLIMIT_NOFILE should be correctly inherited to child processes

@cagedmantis cagedmantis changed the title syscall:Caching RLIMIT_NOFILE is wrong implementation syscall: caching RLIMIT_NOFILE is wrong implementation Apr 12, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 12, 2024
@ianlancetaylor
Copy link
Contributor

Please don't show us code in images. Link to go.googlesource.com or GitHub instead. Images are difficult to read. Thanks.

@ianlancetaylor
Copy link
Contributor

If I understand this correctly, the problem is that if process A uses prlimit to change the resource limit in process B, then the Go standard library isn't aware of that when process B starts process C.

We can't change the current behavior in which process B passes the original RLIMIT_NOFILE value on to process C. That is likely to break existing programs that do not expect a large RLIMIT_NOFILE, as discussed in #46279. That issues shows real failure cases when we didn't do that, so we can't change back.

So I think the question is whether there is a way that process B could detect that the process limit was changed by some other process via prlimit. In that case we should pass down the updated value, just as we already do if process B explicitly calls syscall.Setrlimit itself.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 12, 2024
@cagedmantis cagedmantis added this to the Backlog milestone Apr 12, 2024
@cagedmantis
Copy link
Contributor

@golang/runtime

@ls-ggg
Copy link
Author

ls-ggg commented Apr 15, 2024

Please don't show us code in images. Link to go.googlesource.com or GitHub instead. Images are difficult to read. Thanks.
Ok, I've replaced the image with a commit link

@lifubang
Copy link
Contributor

lifubang commented May 5, 2024

I think there is also a get/set/set race situation here, please see opencontainers/runc#4266 and opencontainers/runc#4265 .


Go CL 393354 (in Go 1.19beta1) introduced raising the RLIMIT_NOFILE soft limit to the current value of the hard limit.

Further CLs (in Go 1.21, but backported to Go 1.20.x and 1.19.x) introduced more code in between getrlimit and setrlimit syscalls (see Go's src/syscall/rlimit.go).

In runc exec, we use prlimit(2) to set rlimits for the child (runc init) some time after we start it. This results in the following race:

runc exec (parent)                 runc init (child)
------------------                 -----------------
(see (*setnsProcess).start         (see init in src/syscall/rlimit.go)

                                   getrlimit(RLIMIT_NOFILE, &lim)
prlimit                            ....
                                   setrlimit(RLIMIT_NOFILE, &nlim)

So, I think we need to find a way to solve two problems:

  1. A get/set/set race with prlimit;
  2. How to know setted nofile limit by other process using prlimit.

@lifubang
Copy link
Contributor

So I think the question is whether there is a way that process B could detect that the process limit was changed by some other process via prlimit. In that case we should pass down the updated value, just as we already do if process B explicitly calls syscall.Setrlimit itself.

What's the progress now?
I think maybe there is no quick way to fix this issue? How about provide a public function in runtime to clear the cache? Let the user to decide whether the cache should be cleared or not.

@ianlancetaylor
Copy link
Contributor

We already have a public function to clear the cache:

    var r syscall.Rlimit
    syscall.Getrlimit(syscall.RLIMIT_NOFILE, &r)
    syscall.Setrlimit(syscall.RLIMIT_NOFILE, &r)

@lifubang
Copy link
Contributor

We already have a public function to clear the cache:

    var r syscall.Rlimit
    syscall.Getrlimit(syscall.RLIMIT_NOFILE, &r)
    syscall.Setrlimit(syscall.RLIMIT_NOFILE, &r)

We have considered this method, but it will cause new get/set race.

@ianlancetaylor
Copy link
Contributor

This only matters just before starting a child, so do it then. If that is a race, then you have a race problem no matter what.

@AGWA
Copy link

AGWA commented May 20, 2024

Maybe syscall.SysProcAttr should contain fields for setting the child's rlimits. runc could use that instead of the inherently racy practice of calling prlimit after starting the child.

@thepudds
Copy link
Contributor

Hi @lifubang, do you agree with what Ian wrote in #66797 (comment), or if not, could you briefly expand on why?

For the people reporting this here, is this something that affects Go code primarily consumed as a library, or package main? If there is not a valid workaround or solution in the near term, I wonder if a GODEBUG controlling the behavior could be temporary solution to help with backwards compatibility, or maybe that does not make sense?

@lifubang
Copy link
Contributor

lifubang commented May 23, 2024

if not, could you briefly expand on why?


Sorry for delay, because I'm in another busy thread.
I think it's easy to write a repro code, if someone has a intresting, it will be appreciated.

The explanation is like that:
If we use the way provided by Ian(and @ls-ggg) in process B which created by process A, this will cause a new race:

B:syscall.Getrlimit(syscall.RLIMIT_NOFILE, &r)
A:prlimit(B, a different nofile_rlimit)
B:syscall.Setrlimit(syscall.RLIMIT_NOFILE, &r)


Besides this, this way also causes us doing two unneeded syscall.

The core reason is that using Get/Set to clear the internal cache is not a atomic operation, I suggest to add a public method to clear this cache directly. For example: runtime.ClearSyscallRlimitCache().

@ianlancetaylor
Copy link
Contributor

I understand the race. My point is that this only matters when you are about to start a child process. So do it then. There is already a race when starting a child process: you can start the child before or after the other process calls prlimit. So the race that you mention doesn't make matters any worse.

Two additional system calls are trivial, it's not worth adding a very special purpose operation to avoid that.

@kolyshkin
Copy link
Contributor

We already have a public function to clear the cache:

    var r syscall.Rlimit
    syscall.Getrlimit(syscall.RLIMIT_NOFILE, &r)
    syscall.Setrlimit(syscall.RLIMIT_NOFILE, &r)

@ianlancetaylor the problem is, this relies on setrlimit(2) syscall returning no error, which is not always the case (for example, when we're in user namespace, we can't change rlimits for ourselves).

Perhaps something like this will fix the issue for us:

diff --git a/src/syscall/rlimit.go b/src/syscall/rlimit.go
index d77341bde9..8184f17ab6 100644
--- a/src/syscall/rlimit.go
+++ b/src/syscall/rlimit.go
@@ -39,11 +39,10 @@ func init() {
 }
 
 func Setrlimit(resource int, rlim *Rlimit) error {
-       err := setrlimit(resource, rlim)
-       if err == nil && resource == RLIMIT_NOFILE {
+       if resource == RLIMIT_NOFILE {
                // Store nil in origRlimitNofile to tell StartProcess
                // to not adjust the rlimit in the child process.
                origRlimitNofile.Store(nil)
        }
-       return err
+       return setrlimit(resource, rlim)
 }

@ianlancetaylor
Copy link
Contributor

I'm OK with that kind of change.

@gopherbot
Copy link

Change https://go.dev/cl/587918 mentions this issue: syscall: rm go:linkname from origRlimitNofile

@gopherbot
Copy link

Change https://go.dev/cl/588076 mentions this issue: syscall: Setrlimit: always clean rlimitNofileCache

@lifubang
Copy link
Contributor

Perhaps something like this will fix the issue for us:

I think this needs more discussion.
Setting nofile rlimit error means we don’t set it successfully, if we clear it at this time, it will cause the new child process’s nofile rlimit in a wrong value state.

If I’m saying wrong, please let me know.

@kolyshkin
Copy link
Contributor

I think this needs more discussion. Setting nofile rlimit error means we don’t set it successfully, if we clear it at this time, it will cause the new child process’s nofile rlimit in a wrong value state.

If I’m saying wrong, please let me know.

Perhaps something like this will fix the issue for us:

I think this needs more discussion. Setting nofile rlimit error means we don’t set it successfully, if we clear it at this time, it will cause the new child process’s nofile rlimit in a wrong value state.

If I’m saying wrong, please let me know.

Perhaps something like this will fix the issue for us:

I think this needs more discussion. Setting nofile rlimit error means we don’t set it successfully, if we clear it at this time, it will cause the new child process’s nofile rlimit in a wrong value state.

If I’m saying wrong, please let me know.

Whoever calls Setrlimit will return an error in this case. They will know they are not able to set the limits.

In our (runc) use case, we just need to remove the cache. We can set the limit as well, but that's optional, since we will use unix.Prlimit as we always did to set the limit.

To me, this is the best middle ground between Go and runc needs.

PS we can discuss it further in opencontainers/runc#4290 (i'm going to add go tip to its CI).

@lifubang
Copy link
Contributor

To me, this is the best middle ground between Go and runc needs.

Yes, I think this is enough for runc, but not enough for golang.

@lifubang
Copy link
Contributor

Whoever calls Setrlimit will return an error in this case.

Soft > Hard

@ianlancetaylor
Copy link
Contributor

I don't think it matters much. If Setrlimit fails for an ordinary call, it's likely to also fail when it is called while fork/execing a child.

@lifubang
Copy link
Contributor

lifubang commented May 24, 2024 via email

@ianlancetaylor
Copy link
Contributor

I am reluctant to add new API for this very special case. It is very unusual for a program to care about the rlimit for child processes. It is very unusual for a program to change its rlimit and fail. The new API is only required for the intersection of those two very unusual cases.

gopherbot pushed a commit that referenced this issue May 24, 2024
Since the introduction of origRlimitNofileCache in CL 476097 the only way to
disable restoring RLIMIT_NOFILE before calling execve syscall
(os.StartProcess etc) is this:

	var r syscall.Rlimit
	syscall.Getrlimit(syscall.RLIMIT_NOFILE, &r)
	syscall.Setrlimit(syscall.RLIMIT_NOFILE, &r)

The problem is, this only works when setrlimit syscall succeeds, which
is not possible in some scenarios.

Let's assume that if a user calls syscall.Setrlimit, they
unconditionally want to disable restoring the original rlimit.

For #66797.

Change-Id: I20d0365df4bd6a5c3cc8c22b0c0db87a25b52746
Reviewed-on: https://go-review.googlesource.com/c/go/+/588076
Run-TryBot: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

9 participants