Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting memory.swap to memory.limit can cause unexpected behavior #8442

Open
mmiranda96 opened this issue Apr 24, 2023 · 8 comments
Open

Setting memory.swap to memory.limit can cause unexpected behavior #8442

mmiranda96 opened this issue Apr 24, 2023 · 8 comments
Labels

Comments

@mmiranda96
Copy link

Description

This is a follow-up issue to #7783.

The OCI specifies memory limit values as optional (source). However, there is no way to differentiate between a default zero and an explicit zero in the CRI specification, since it uses proto v3 (source).

In the case of a default zero, swap overriding can cause issues on a container if swap behavior is allowed on the machine.

I created this issue to revisit this change and consider all the implications of it, as it recently caused an issue on a particular scenario (see "Other relevant information" section for details).

Steps to reproduce the issue

  1. Create a container with a specified memory limit but no specified swap limit.
  2. Force the container to swap memory.
  3. Verify swap is not allowed and the container is OOMKilled.

Describe the results you received and expected

Result: container is OOMKilled

Expected: container is not OOMKilled since swap limit was not explicitly set to 0.

What version of containerd are you using?

1.6.18

Any other relevant information

A bit out of scope for containerd, but it helps for full context: I was running a Kubernetes cluster (v1.24) using containerd (v1.6.18). I created a swap partition on the node, enabled it, and specified --fail-swap-on=false on the kubelet. I created the following pod:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-swap
  labels:
    app: python-swap
spec:
  replicas: 1
  selector:
    matchLabels:
      app: python-swap
  template:
    metadata:
      labels:
        app: python-swap
    spec:
      containers:
      - name: python-swap
        image: python:3.7-alpine3.16
        resources:
          limits:
            memory: "512Mi"
        command: [ "python3", "-c" ]
        args: [ "import time; x = bytearray(1024*1024*1000); time.sleep(600)" ]

Before the change, pod run successfully with a swap partition on the node. After the change, pod was OOMKilled by containerd.

Show configuration if it is related to CRI plugin.

No response

@mmiranda96
Copy link
Author

CC @SergeyKanzhelev

@SergeyKanzhelev
Copy link
Contributor

SergeyKanzhelev commented Apr 24, 2023

@mqasimsarfraz what was the motivation for this change? It is not ideal that the default behavior changed when the feature (SwapOn) in not enabled in kubelet. Change of behavior is not ideal.

Why the expectation is that when the feature gate SwapOn is not enabled, it means that swap needs to be disabled?

@mqasimsarfraz
Copy link
Contributor

mqasimsarfraz commented Apr 25, 2023

@mqasimsarfraz what was the motivation for this change? It is not ideal that the default behavior changed when the feature (SwapOn) in not enabled in kubelet. Change of behavior is not ideal.

@SergeyKanzhelev #7783 didn't change any default behavior but made it consistent (restore default behaviour) across all the runtimes (dockershim, cri-dockerd, cri-o). Especially given in dockershim we don't allow container to use any swap so if user upgrades from v1.23.0 (dockershim) to v1.24.0 (containerd) the swap behavior would change (suddenly container allowed to swap and not killed by OOMKiller). Which would be unexpected ?

Why the expectation is that when the feature gate SwapOn is not enabled, it means that swap needs to be disabled?

AFAIU this is expected behavior form kubelet as well. If we don't keep it this way as soon as user enables feature gate NodeSwap ( LimitedSwap being the default behavior) swapping will be disabled. Also, if user wants containers to use swap they can just use UnlimitedSwap with an extra configuration.

Please feel free to correct my understanding. I believe if we want to allow containers (with limits) to swap we will have to propagate this change to all the runtimes (cri-dockerd, cri-o and containerd) as well since I found this issue while testing OOMKiller against all these runtimes.

@ruiwen-zhao
Copy link
Member

If we don't keep it this way as soon as user enables feature gate NodeSwap ( LimitedSwap being the default behavior) swapping will be disabled.

@mqasimsarfraz not sure if I understand this. Kubelet already sets lcr.MemorySwapLimitInBytes = lcr.MemoryLimitInBytes , so containerd should just use whatever s.Linux.Resources.Memory.Swap passed from kubelet, instead of overriding it. Is that right?

@SergeyKanzhelev
Copy link
Contributor

This is the example of a change of a behavior: kubernetes-sigs/cri-tools#1146 (comment) for Containerd users.

K8s users with Containerd who enabled swap on a node, and didn't care about the SwapOn had a behavior of swap being allowed. After this PR, behavior for those users changes.

I agree, this is an inconsistency. And I am not sure what is better - revert it for Containerd for users migrating from <1.6.13 to 1.6.13+ or keep it as many users are already on 1.6.13+.

Allowing swap is less of a backward incompatibility than disabling swap use. But the same time this behavior is around for 9 releases already.

@samuelkarp do you have an opinion?

@mqasimsarfraz
Copy link
Contributor

mqasimsarfraz commented May 5, 2023

This is the example of a change of a behavior: kubernetes-sigs/cri-tools#1146 (comment) for Containerd users.

It is clear now. Thanks for sharing!

Allowing swap is less of a backward incompatibility than disabling swap use. But the same time this behavior is around for 9 releases already.

I am trying to understand why disabling swap is backward incompatible. Dockershim was already disabling the swap by default and for NodeSwap feature gate the default behavior is also to disabled swap? Or you meant in context for containerd users this change is backward incompatible?

@mqasimsarfraz
Copy link
Contributor

mqasimsarfraz commented May 5, 2023

If we don't keep it this way as soon as user enables feature gate NodeSwap ( LimitedSwap being the default behavior) swapping will be disabled.

@mqasimsarfraz not sure if I understand this. Kubelet already sets lcr.MemorySwapLimitInBytes = lcr.MemoryLimitInBytes , so containerd should just use whatever s.Linux.Resources.Memory.Swap passed from kubelet, instead of overriding it. Is that right?

@ruiwen-zhao containerd is only setting default swap limit in case kubelet doesn't provide anything since lcr.MemorySwapLimitInBytes = lcr.MemoryLimitInBytes is guarded by feature gate. The idea was to have consistent behavior in containerd as in kubelet so enabling feature gate doesn't result in change of behavior. If kubelet will provide any value that will be used. Does that make sense now?

@samuelkarp
Copy link
Member

@samuelkarp do you have an opinion?

I think relying on --fail-swap-on=false for an un-specified (or under-specified) behavior in the system is a mistake and something that should be discouraged. I don't think it's unreasonable for containerd to have changed behavior here, especially to make the behavior more consistent with other container runtimes.

#7783 was in v1.7.0 from the beginning, but was added to v1.6.x in a cherry-pick in #7815 and has been included in 9 releases now (and was absent for the 13 prior releases). We could revert the cherry-pick for 1.6.x but keep the behavior as-is for 1.7.x. That would allow users on 1.6.x to have the same behavior that 1.6.x started with (though with 13 releases without and 9 release with, it's pretty close to half and half) and upgrading to 1.7.x would be a good introduction of the changed behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants