Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd-shim-runc-v1 SIGSEGV when starting container #8884

Closed
zvier opened this issue Jul 27, 2023 · 2 comments
Closed

containerd-shim-runc-v1 SIGSEGV when starting container #8884

zvier opened this issue Jul 27, 2023 · 2 comments
Labels

Comments

@zvier
Copy link
Contributor

zvier commented Jul 27, 2023

Description

In some accidental circumstances, containerd container may start failed after creating, and the containerd-shim-runc-v1 fails with a segmentation fault error, the clients will receive a ttrpc: closed: unknown error. The error messages are as follows:

containerd[6072]: time="2023-07-27T11:48:22.160019299+08:00" level=info msg="StartContainer for \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\""
containerd[6072]: time="2023-07-27T11:48:22.163952339+08:00" level=warning msg="\"io.containerd.runc.v1\" is deprecated since containerd v1.4, consider using \"io.containerd.runc.v2\""
containerd[6072]: time="2023-07-27T11:48:22.170476420+08:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v1 type=io.containerd.event.v1
containerd[6072]: time="2023-07-27T11:48:22.170545449+08:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v1 type=io.containerd.ttrpc.v1
containerd[6072]: time="2023-07-27T11:48:22.170561780+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v1 type=io.containerd.internal.v1
containerd[6072]: time="2023-07-27T11:48:22.170623192+08:00" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557 pid=132125 runtime=io.containerd.runc.v1
containerd[6072]: panic: runtime error: invalid memory address or nil pointer dereference
containerd[6072]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x918442]
containerd[6072]: goroutine 9 [running]:
containerd[6072]: github.com/containerd/containerd/runtime/v2/runc.(*Container).Cgroup(0x0, 0x0, 0x0)
containerd[6072]: /go/src/github.com/containerd/containerd/runtime/v2/runc/container.go:285 +0x42
containerd[6072]: github.com/containerd/containerd/runtime/v2/runc/v1.(*service).Stats(0xc0000bc000, 0xac4ba0, 0xc0001aa7b0, 0xc0001aa7e0, 0x7f5545576768, 0x8, 0x10)
containerd[6072]: /go/src/github.com/containerd/containerd/runtime/v2/runc/v1/service.go:600 +0x33
containerd[6072]: github.com/containerd/containerd/runtime/v2/task.RegisterTaskService.func15(0xac4ba0, 0xc0001aa7b0, 0xc00005a460, 0x10, 0x99cc40, 0xc000064b01, 0xc00002b670)
containerd[6072]: /go/src/github.com/containerd/containerd/runtime/v2/task/shim.pb.go:3554 +0xcd
containerd[6072]: github.com/containerd/ttrpc.defaultServerInterceptor(0xac4ba0, 0xc0001aa7b0, 0xc00005a460, 0xc00002b670, 0xc0000a2270, 0xc000064b80, 0x1e, 0x0, 0x30)
containerd[6072]: /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/interceptor.go:45 +0x44
containerd[6072]: github.com/containerd/ttrpc.(*serviceSet).dispatch(0xc00008c6b0, 0xac4ba0, 0xc0001aa7b0, 0xc000112378, 0x17, 0xc000161007, 0x5, 0xc000132280, 0x42, 0x50, ...)
containerd[6072]: /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:95 +0x211
containerd[6072]: github.com/containerd/ttrpc.(*serviceSet).call(0xc00008c6b0, 0xac4ba0, 0xc0001aa7b0, 0xc000112378, 0x17, 0xc000161007, 0x5, 0xc000132280, 0x42, 0x50, ...)
containerd[6072]: /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:64 +0xb5
containerd[6072]: github.com/containerd/ttrpc.(*serverConn).run.func2(0xac4af8, 0xc0000be2c0, 0x3, 0xc000020720, 0xc0000800a0, 0xc000082420, 0xc0000824e0, 0x3)
containerd[6072]: /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:438 +0xf2
containerd[6072]: created by github.com/containerd/ttrpc.(*serverConn).run
containerd[6072]: /go/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:434 +0x63f
containerd[6072]: time="2023-07-27T11:48:22.176861517+08:00" level=info msg="shim disconnected" id=61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557
containerd[6072]: time="2023-07-27T11:48:22.176856897+08:00" level=error msg="collecting metrics for 61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557" error="ttrpc: closed: unknown"
containerd[6072]: time="2023-07-27T11:48:22.176913866+08:00" level=warning msg="cleaning up after shim disconnected" id=61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557 namespace=k8s.io
containerd[6072]: time="2023-07-27T11:48:22.197169414+08:00" level=warning msg="cleanup warnings time=\"2023-07-27T11:48:22+08:00\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=132141 runtime=io.containerd.runc.v1\ntime=\"2023-07-27T11:48:22+08:00\" level=warning msg=\"failed to read init pid file\" error=\"open /run/containerd/io.containerd.runtime.v2.task/k8s.io/61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557/init.pid: no such file or directory\"\n"
containerd[6072]: time="2023-07-27T11:48:22.197625690+08:00" level=error msg="Failed to pipe stdout of container \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\"" error="reading from a closed fifo"
containerd[6072]: time="2023-07-27T11:48:22.197630647+08:00" level=error msg="Failed to pipe stderr of container \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\"" error="reading from a closed fifo"
containerd[6072]: time="2023-07-27T11:48:22.197873626+08:00" level=error msg="StartContainer for \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\" failed" error="failed to create containerd task: failed to create shim task: ttrpc: closed: unknown"
containerd[6072]: time="2023-07-27T11:48:22.407353104+08:00" level=info msg="Container to stop \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
containerd[6072]: time="2023-07-27T11:48:23.409049114+08:00" level=info msg="Container to stop \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
containerd[6072]: time="2023-07-27T11:49:35.262547705+08:00" level=info msg="RemoveContainer for \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\""
containerd[6072]: time="2023-07-27T11:49:35.264278848+08:00" level=info msg="RemoveContainer for \"61b5f724d07f37feacd7903d9c5d1347375eacee041eafcd4b88d31437fe0557\" returns successfully"

Steps to reproduce the issue

There is no method to reproduce this issue now.

Describe the results you received and expected

Containers should be created and started correctly.

What version of containerd are you using?

containerd 1.6.5

Any other relevant information

runc version 1.1.7
5.14.0-3.0.2

Show configuration if it is related to CRI plugin.

disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/media/disk1/containerd"
state = "/run/containerd"
temp = ""
version = 2

[cgroup]
  path = ""

[debug]
  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]

  [plugins."io.containerd.gc.v1.scheduler"]
    deletion_threshold = 0
    mutation_threshold = 100
    pause_threshold = 0.02
    schedule_delay = "0s"
    startup_delay = "100ms"

  [plugins."io.containerd.grpc.v1.cri"]
    device_ownership_from_security_context = false
    disable_apparmor = false
    disable_cgroup = false
    disable_hugetlb_controller = true
    disable_proc_mount = false
    disable_tcp_service = true
    enable_selinux = false
    enable_tls_streaming = false
    enable_unprivileged_icmp = false
    enable_unprivileged_ports = false
    ignore_image_defined_volumes = false
    max_concurrent_downloads = 10
    max_container_log_line_size = 16384
    netns_mounts_under_state_dir = false
    restrict_oom_score_adj = false
    sandbox_image = "registry.corp.kuaishou.com/cloud_admin/pause:3.1"
    selinux_category_range = 1024
    stats_collect_period = 10
    stream_idle_timeout = "4h0m0s"
    stream_server_address = "127.0.0.1"
    stream_server_port = "0"
    systemd_cgroup = false
    tolerate_missing_hugetlb_controller = true
    unset_seccomp_profile = ""

    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
      conf_template = ""
      ip_pref = ""
      max_conf_num = 1

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "overlayfs"

      [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = true
        runtime_engine = "/opt/kata/bin/kata-runtime"
        runtime_path = ""
        runtime_root = ""
        runtime_type = "io.containerd.runtime.v1.linux"

      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v1"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
            SystemdCgroup = true


        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

    [plugins."io.containerd.grpc.v1.cri".image_decryption]
      key_model = "node"

    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"

      [plugins."io.containerd.grpc.v1.cri".registry.auths]

      [plugins."io.containerd.grpc.v1.cri".registry.configs]

      [plugins."io.containerd.grpc.v1.cri".registry.headers]

      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

    [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
      tls_cert_file = ""
      tls_key_file = ""

  [plugins."io.containerd.internal.v1.opt"]
    path = "/opt/containerd"

  [plugins."io.containerd.internal.v1.restart"]
    interval = "10s"

  [plugins."io.containerd.internal.v1.tracing"]
    sampling_ratio = 1.0
    service_name = "containerd"

  [plugins."io.containerd.metadata.v1.bolt"]
    content_sharing_policy = "shared"

  [plugins."io.containerd.monitor.v1.cgroups"]
    no_prometheus = false

  [plugins."io.containerd.runtime.v2.task"]
    platforms = ["linux/amd64"]
    sched_core = false

  [plugins."io.containerd.service.v1.diff-service"]
    default = ["walking"]

  [plugins."io.containerd.service.v1.tasks-service"]
    rdt_config_file = ""

  [plugins."io.containerd.snapshotter.v1.aufs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.btrfs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.devmapper"]
    async_remove = false
    base_image_size = ""
    discard_blocks = false
    fs_options = ""
    fs_type = ""
    pool_name = ""
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.native"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.overlayfs"]
    root_path = ""
    upperdir_label = false

  [plugins."io.containerd.snapshotter.v1.zfs"]
    root_path = ""

  [plugins."io.containerd.tracing.processor.v1.otlp"]
    endpoint = ""
    insecure = false
    protocol = ""

[proxy_plugins]

[stream_processors]

  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar"

  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar+gzip"

[timeouts]
  "io.containerd.timeout.bolt.open" = "0s"
  "io.containerd.timeout.shim.cleanup" = "5s"
  "io.containerd.timeout.shim.load" = "5s"
  "io.containerd.timeout.shim.shutdown" = "3s"
  "io.containerd.timeout.task.state" = "2s"

[ttrpc]
  address = ""
  gid = 0
  uid = 0
@zvier zvier added the kind/bug label Jul 27, 2023
@zvier zvier closed this as completed Aug 10, 2023
@sfc-gh-aivanou
Copy link

@zvier how did you resolve this?

@zvier
Copy link
Contributor Author

zvier commented Oct 6, 2023

@zvier how did you resolve this?

This problem has sloved by #7557. You can use containerd-shim-runc-v1 or backport the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants