vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using `dedicatedCpuPlacement` #11749

lyarwood · 2024-04-18T17:05:05Z

What happened:

This is likely open to interpretation around the goals of dedicatedCpuPlacement but at present on hosts without SMT enabled vCPUs exposed as threads to the guest OS are pinned to non-sibling pCPUs. Users might be surprised by the performance of workloads across such threads given the request.

What you expected to happen:

The VirtualMachineInstance to not schedule until a SMT enabled compute is present in the environment.

How to reproduce it (as minimally and precisely as possible):

Uses kubevirt/kubevirtci#1171

$ export \
 KUBEVIRT_PROVIDER=k8s-1.28 \
 KUBEVIRT_MEMORY_SIZE=$((16 * 1024))M \
 KUBEVIRT_HUGEPAGES_2M=$((4 * 1024)) \
 KUBEVIRT_DEPLOY_CDI=true \
 KUBEVIRT_CPU_MANAGER_POLICY=static \
 KUBEVIRT_NUM_NUMA_NODES=2 \
 KUBEVIRT_NUM_VCPU=16 \
 KUBEVIRTCI_CONTAINER_SUFFIX=latest \
 KUBEVIRTCI_GOCLI_CONTAINER=quay.io/kubevirtci/gocli:latest
$ cd kubevirtci
$ make cluster-up
$ cd ../kubevirt
$ rsync -av ../kubevirtci/cluster-up/ ./cluster-up/ && rsync -av ../kubevirtci/_ci-configs/ ./_ci-configs/
$ make cluster-sync
[..]
$ ./cluster-up/kubectl.sh patch kv/kubevirt -n kubevirt --type merge -p '{"spec":{"configuration":{"developerConfiguration":{"featureGates": ["CPUManager","NUMA"]}}}}'
[..]
$ ./cluster-up/kubectl.sh apply -f -<<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: dedicated-threads
spec:
  domain:
    cpu:
      threads: 2
      dedicatedCpuPlacement: true
    devices:
      disks:
        - disk:
            bus: virtio
          name: containerdisk
        - disk:
            bus: virtio
          name: cloudinitdisk
    resources:
      requests:
        memory: 2Gi
  volumes:
    - containerDisk:
        image: quay.io/containerdisks/fedora:39
      name: containerdisk
    - cloudInitNoCloud:
        userData: |
          #!/bin/sh
          mkdir -p  /home/fedora/.ssh
          curl https://github.com/lyarwood.keys > /home/fedora/.ssh/authorized_keys
          chown -R fedora: /home/fedora/.ssh
      name: cloudinitdisk
EOF

$ ./cluster-up/virtctl.sh ssh -lfedora dedicated-threads lscpu
[..]
Thread(s) per core:  2
Core(s) per socket:   1
Socket(s):                   1
 [..]

$ ./cluster-up/kubectl.sh exec pods/virt-launcher-dedicated-threads-6lwcm -- virsh vcpuinfo 1
selecting podman as container runtime
VCPU:           0
CPU:            1
State:          running
CPU time:       8.0s
CPU Affinity:   -y--------------

VCPU:           1
CPU:            2
State:          running
CPU time:       4.3s
CPU Affinity:   --y-------------

$ ./cluster-up/ssh.sh node01 lscpu
[..]
Thread(s) per core:                 1
Core(s) per socket:                 16
Socket(s):                                 1
[..]

Additional context:
N/A

Environment:

KubeVirt version (use virtctl version): N/A
Kubernetes version (use kubectl version): N/A
VM or VMI specifications: N/A
Cloud provider or hardware configuration: N/A
OS (e.g. from /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Install tools: N/A
Others: N/A

The text was updated successfully, but these errors were encountered:

lyarwood · 2024-04-18T17:13:04Z

Just for context, guestMappingPassthrough doesn't check for this case either:

./cluster-up/kubectl.sh apply -f -<<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: dedicated-threads-with-numa
spec:
  domain:
    cpu:
      threads: 2
      dedicatedCpuPlacement: true
      numa:
        guestMappingPassthrough: {}
    memory:
      guest: 1Gi
      hugepages:
        pageSize: "2Mi"
    devices:
      disks:
        - disk:
            bus: virtio
          name: containerdisk
        - disk:
            bus: virtio
          name: cloudinitdisk
    resources:
      requests:
        memory: 2Gi
  volumes:
    - containerDisk:
        image: quay.io/containerdisks/fedora:39
      name: containerdisk
    - cloudInitNoCloud:
        userData: |
          #!/bin/sh
          mkdir -p  /home/fedora/.ssh
          curl https://github.com/lyarwood.keys > /home/fedora/.ssh/authorized_keys
          chown -R fedora: /home/fedora/.ssh
      name: cloudinitdisk
EOF
[..]
$ ./cluster-up/kubectl.sh get vmis
NAME                          AGE   PHASE     IP               NODENAME   READY
dedicated-threads-with-numa   12s   Running   10.244.196.149   node01     True

vladikr · 2024-04-19T02:57:15Z

Thanks @lyarwood.

Indeed, this is open to interpretation. So far, the general intent was to provide dedicated CPUs to the guest however, the topology assignment was on a best-efforts basis. The only exception was with guestMappingPassthrough where we are strict about numa nodes assignment.

We can of course follow up and improve the correct behavior by further enhancing the dedicatedCPUs API.
I would start by reporting whether SMP is enabled on the nodes.

lyarwood · 2024-04-19T08:26:47Z

We can of course follow up and improve the correct behavior by further enhancing the dedicatedCPUs API.
I would start by reporting whether SMP is enabled on the nodes.

ACK thanks for confirming this is a valid thing to fix. It should be easy enough to label a node given the value in /sys/devices/system/cpu/smt/active and to use that label when scheduling later if threads is greater than 1. I'll try to find some time to work on this in the coming weeks.

/assign

lyarwood added the kind/bug label Apr 18, 2024

kubevirt-bot assigned lyarwood Apr 19, 2024

lyarwood mentioned this issue Apr 19, 2024

instancetype: Extend PreferSpread kubevirt/community#287

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using `dedicatedCpuPlacement` #11749

vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using `dedicatedCpuPlacement` #11749

lyarwood commented Apr 18, 2024 •

edited

lyarwood commented Apr 18, 2024

vladikr commented Apr 19, 2024 •

edited

lyarwood commented Apr 19, 2024

vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using dedicatedCpuPlacement #11749

vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using dedicatedCpuPlacement #11749

Comments

lyarwood commented Apr 18, 2024 • edited

lyarwood commented Apr 18, 2024

vladikr commented Apr 19, 2024 • edited

lyarwood commented Apr 19, 2024

vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using `dedicatedCpuPlacement` #11749

vCPU exposed as threads pinned to non-thread sibling pCPUs on hosts without SMT when using `dedicatedCpuPlacement` #11749

lyarwood commented Apr 18, 2024 •

edited

vladikr commented Apr 19, 2024 •

edited