Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerd is hanging, kernel trace code: Bad RIP value #45675

Closed
rkress3 opened this issue Jun 1, 2023 · 4 comments
Closed

dockerd is hanging, kernel trace code: Bad RIP value #45675

rkress3 opened this issue Jun 1, 2023 · 4 comments
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage version/unsupported version/19.03

Comments

@rkress3
Copy link

rkress3 commented Jun 1, 2023

Description

We are observing docker hangs on multiple servers with the same kernel trace.

The systems have multiple applications running and I am looking for help in debugging.

[764555.998817] INFO: task dockerd:20990 blocked for more than 122 seconds.
[764556.079237] Not tainted 5.4.17-2136.316.7.el7uek.x86_64 #2
[764556.152399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[764556.247384] dockerd D 0 20990 1 0x00004080
[764556.314287] Call Trace:
[764556.344798] __schedule+0x2a1/0x58b
[764556.387763] schedule+0x4f/0xbb
[764556.426608] rwsem_down_write_slowpath+0x2bb/0x470
[764556.485203] ? queued_spin_lock_slowpath+0xb/0x13
[764556.542820] ? finish_wait+0x6a/0x7f
[764556.586869] down_write+0x46/0x48
[764556.627815] sync_inodes_sb+0xaf/0x2bc
[764556.673941] __sync_filesystem+0x1b/0x5b
[764556.722150] sync_filesystem+0x40/0x4b
[764556.768288] ovl_sync_fs+0x3f/0x60 [overlay]
[764556.820671] __sync_filesystem+0x33/0x5b
[764556.820674] sync_filesystem+0x40/0x4b
[764556.820677] generic_shutdown_super+0x27/0x124
[764556.970027] kill_anon_super+0x12/0x2d
[764557.016192] deactivate_locked_super+0x4c/0x7d
[764557.070705] deactivate_super+0x49/0x64
[764557.070709] cleanup_mnt+0xd1/0x115
[764557.161062] __cleanup_mnt+0x12/0x18
[764557.161068] task_work_run+0x71/0xa6
[764557.249431] exit_to_usermode_loop+0xc8/0x126
[764557.302929] do_syscall_64+0x1a5/0x1e4
[764557.349134] entry_SYSCALL_64_after_hwframe+0x175/0x0
[764557.410923] RIP: 0033:0x5650fe4d829b
[764557.435940] IPVS: wrr: TCP 172.31.78.174:7001 - no destination available
[764557.455029] Code: Bad RIP value.
[764557.455031] RSP: 002b:000000c000bccd10 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[764557.455032] RAX: 0000000000000000 RBX: 000000c000089000 RCX: 00005650fe4d829b
[764557.455032] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000c005e329a0
[764557.455033] RBP: 000000c000bccd68 R08: 0000000000000000 R09: 0000000000000000
[764557.455033] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[764557.455033] R13: 0000000000000001 R14: 0000000000000017 R15: ffffffffffffffff

Reproduce

No clear steps known to reproduce

Expected behavior

No hangs.

docker version

Client: Docker Engine - Community
 Version:           19.03.11-ol
 API version:       1.40
 Go version:        go1.16.2
 Git commit:        9bb540d
 Built:             Fri Jul 23 01:33:55 2021
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.11-ol
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.16.2
  Git commit:       9bb540d
  Built:            Fri Jul 23 01:32:08 2021
  OS/Arch:          linux/amd64
  Experimental:     false
  Default Registry: docker.io
 containerd:
  Version:          v1.4.8
  GitCommit:        7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc:
  Version:          1.1.4
  GitCommit:        5fd4c4d
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

docker info

Client:
 Debug Mode: false

Server:
 Containers: 239
  Running: 193
  Paused: 0
  Stopped: 46
 Images: 81
 Server Version: 19.03.11-ol
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc version: 5fd4c4d
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.4.17-2136.316.7.el7uek.x86_64
 Operating System: Oracle Linux Server 7.9
 OSType: linux
 Architecture: x86_64
 CPUs: 128
 Total Memory: 1007GiB
 Name: ost-ugbu-enterprise-yyz-ugbuueap2f29pukc
 ID: KDDB:SPOX:BXEC:G4KV:FRQN:PNH3:CF2S:TJFO:UE4B:W7UR:ZFVU:JJQI
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Registries:

Additional Info

No response

@rkress3 rkress3 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Jun 1, 2023
@thaJeztah
Copy link
Member

Have you reported this to Oracle as well? Looks like you're running Oracle's fork of Docker, which has patches that are not in upstream (and docker 19.03 reached EOL 4 years ago)

@ningmingxiao
Copy link
Contributor

can you show docker stack ? use kill -SIGUSR1 $(pidof dockerd) will save at /var/run/docker/ @rkress3

@rkress3
Copy link
Author

rkress3 commented Jun 2, 2023

I have not reported it there yet.

I will, thank you.

@neersighted
Copy link
Member

I'm going to close this for now as there's no action to take here in the upstream, since you're using a different codebase; if you can get this to reproduce with Moby (or a distribution like Docker CE) on a currently maintained branch (20.10, 23.0, 24.0), feel free to ask for a re-open.

Likewise, feel free to post updates here if the Oracle folks solve this in their fork, since people may find this issue via Google.

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage version/unsupported version/19.03
Projects
None yet
Development

No branches or pull requests

4 participants