Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: cgroup2: TestInspectOomKilledTrue fails intermittently #41929

Open
AkihiroSuda opened this issue Jan 26, 2021 · 2 comments
Open

Flaky test: cgroup2: TestInspectOomKilledTrue fails intermittently #41929

AkihiroSuda opened this issue Jan 26, 2021 · 2 comments
Labels
area/cgroup2 cgroup v2 area/testing kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Jan 26, 2021

Description

As of f266f13, TestInspectOomKilledTrue fails intermittently on cgroup v2 hosts

func TestInspectOomKilledTrue(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
skip.If(t, testEnv.DaemonInfo.CgroupDriver == "none")
skip.If(t, !testEnv.DaemonInfo.MemoryLimit || !testEnv.DaemonInfo.SwapLimit)
defer setupTest(t)()
ctx := context.Background()
client := testEnv.APIClient()
cID := container.Run(ctx, t, client, container.WithCmd("sh", "-c", "x=a; while true; do x=$x$x$x$x; done"), func(c *container.TestContainerConfig) {
c.HostConfig.Resources.Memory = 32 * 1024 * 1024
})
poll.WaitOn(t, container.IsInState(ctx, client, cID, "exited"), poll.WithDelay(100*time.Millisecond))
inspect, err := client.ContainerInspect(ctx, cID)
assert.NilError(t, err)
assert.Check(t, is.Equal(true, inspect.State.OOMKilled))
}

Steps to reproduce the issue:

$ make TEST_SKIP_INTEGRATION_CLI=1 TESTFLAGS="-test.run TestInspectOomKilledTrue -test.count 10" test-integration 

Describe the results you received:

1 PASS, 9 FAIL

=== RUN   TestInspectOomKilledTrue                                                                               
--- FAIL: TestInspectOomKilledTrue (1.15s)                                                                       
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue                                                                               
--- FAIL: TestInspectOomKilledTrue (0.63s)                                                                       
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue
--- FAIL: TestInspectOomKilledTrue (0.59s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue              
--- FAIL: TestInspectOomKilledTrue (0.64s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue                  
--- FAIL: TestInspectOomKilledTrue (0.62s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue                                                                               
--- FAIL: TestInspectOomKilledTrue (0.66s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue
--- PASS: TestInspectOomKilledTrue (0.61s)        
=== RUN   TestInspectOomKilledTrue                                                                               --- FAIL: TestInspectOomKilledTrue (0.64s)       
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue
--- FAIL: TestInspectOomKilledTrue (0.62s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
=== RUN   TestInspectOomKilledTrue
--- FAIL: TestInspectOomKilledTrue (0.61s)
    kill_test.go:171: assertion failed: true (true bool) != false (inspect.State.OOMKilled bool)
FAIL

Describe the results you expected:

10 PASS

Additional information you deem important (e.g. issue happens only occasionally):

  • kernel cmdline: systemd.unified_cgroup_hierarchy=1 cgroup_enable=memory swapaccount=1
  • swapon -a and swapoff -a do not change the result.
  • The test seems stable on v1 hosts

Output of docker version:

Client:
 Version:           20.10.0-dev
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        c6bb56136
 Built:             Tue Jan 26 08:46:11 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          dev
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       f266f13965
  Built:            Tue Jan 26 08:45:56 2021
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.4.0-2523-g1230bd630
  GitCommit:        1230bd63031ba4b65709103b5cb8f5be78a43b75
 runc:
  Version:          1.0.0-rc92+dev
  GitCommit:        c69ae759fbf5acf6e8ef805471b99feee8246c3c
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc., v0.5.1-3-g8b8725d)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: dev
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux kata runsc-kvm sysbox-runc crun io.containerd.runc.v2 runsc runc runc-rc92
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1230bd63031ba4b65709103b5cb8f5be78a43b75
 runc version: c69ae759fbf5acf6e8ef805471b99feee8246c3c
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.8.0-40-generic
 Operating System: Ubuntu 20.10
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.61GiB
 Name: suda-ws01
 ID: E2YB:EGZO:6BNW:EPHS:4WFQ:EIDV:ZZ6D:QBZK:6673:CIOR:DLZ6:SI3D
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 22
  Goroutines: 34
  System Time: 2021-01-26T17:58:54.968084742+09:00
  EventsListeners: 0
 Username: akihirosuda
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

WARNING: Support for cgroup v2 is experimental

Additional environment details (AWS, VirtualBox, physical, etc.):
VMware Fusion

@AkihiroSuda AkihiroSuda added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. area/testing area/cgroup2 cgroup v2 labels Jan 26, 2021
@thaJeztah
Copy link
Member

Do you think it's a race in the test, or do we miss an event coming from containerd? So, IIUC, the container exits, but either not because it was "OOM killed", or it was killed because of that, but we didn't receive that info?

@thaJeztah thaJeztah changed the title cgroup2: TestInspectOomKilledTrue fails intermittently Flaky test: cgroup2: TestInspectOomKilledTrue fails intermittently Feb 2, 2021
@tgross
Copy link

tgross commented Jul 28, 2022

We've hit this same problem in a test for Nomad (hashicorp/nomad#13119). I did some digging and it looks like containerd has a patch for this behavior in containerd/containerd#6323 that was released in v1.6.3 and v1.5.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cgroup2 cgroup v2 area/testing kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

No branches or pull requests

3 participants