dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU #1292

rootfs · 2024-03-06T19:05:25Z

…w much their normalized utitlizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU Signed-off-by: Huamin Chen <hchen@redhat.com>

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go

marceloamaral

@rootfs, it seems like you're suggesting duplicating the functionality of the ratio power model component.

Here's the ratio code: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/estimator/local/ratio.go#L84

That is, each process's GPU utilization will be aggregated to represent the node utilization. Subsequently, the Ratio power model will calculate the process power consumption based on the ratio of the process's GPU utilization and the node GPU utilization (the node GPU utilization is the totalNormalizedUtil that you are proposing here).

…ses' compute utilization on these devices Signed-off-by: Huamin Chen <hchen@redhat.com>

rootfs · 2024-03-07T19:23:52Z

@rootfs, it seems like you're suggesting duplicating the functionality of the ratio power model component.

Here's the ratio code: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/estimator/local/ratio.go#L84

That is, each process's GPU utilization will be aggregated to represent the node utilization. Subsequently, the Ratio power model will calculate the process power consumption based on the ratio of the process's GPU utilization and the node GPU utilization (the node GPU utilization is the totalNormalizedUtil that you are proposing here).

@marceloamaral the current ratio code doesn't support multiple (GPU) devices yet, so this is the gap stopper before we get there.

Signed-off-by: Huamin Chen <hchen@redhat.com>

rootfs · 2024-03-08T12:58:44Z

@rootfs, it seems like you're suggesting duplicating the functionality of the ratio power model component.

Here's the ratio code: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/estimator/local/ratio.go#L84

That is, each process's GPU utilization will be aggregated to represent the node utilization. Subsequently, the Ratio power model will calculate the process power consumption based on the ratio of the process's GPU utilization and the node GPU utilization (the node GPU utilization is the totalNormalizedUtil that you are proposing here).

@marceloamaral right now, the Ratio model only has the process index, no process pid or utilization. It will take a big refactor to get there. For the GPU case, the ratio model code in this PR is a small change. Once we decide how to refactor the ratio model in the future, let's revisit the consolidation then.

sthaha · 2024-03-10T23:10:10Z

bpfassets/libbpf/bpf.o/amd64_kepler.bpf.o

Is an update to bpf expected as part of this patch?

the module gets recompiled each time, we should git ignore it. The Makefile rebuilds the module anyway.

marceloamaral · 2024-03-13T08:07:16Z

@rootfs the Ratio power model's logic involves calculating the ratio utilization of a process by the total utilization and multiplying it by the resource power consumption. So, with each process GPU utilization, total GPU utilization, and GPU power consumption, we can calculate the GPU power consumption for each process.

Instead of add a new power model logic inside the DCGM, we should just index the process GPU utilization with the parent GPU ID (not with the MIG ID).

The PR #1299 should fix the issue #1293.

rootfs · 2024-03-13T18:05:15Z

@rootfs the Ratio power model's logic involves calculating the ratio utilization of a process by the total utilization and multiplying it by the resource power consumption. So, with each process GPU utilization, total GPU utilization, and GPU power consumption, we can calculate the GPU power consumption for each process.

Agreed and this is what the PR does: it gets the processes' per GPU utilization and uses that info to calculate power consumption on each GPU.

Instead of add a new power model logic inside the DCGM, we should just index the process GPU utilization with the parent GPU ID (not with the MIG ID).

The current ratio model based on aggregate usage on all devices are not applicable to GPUs and potentially also has issues with heterogenous CPUs that have performance and efficiency cores. We need to address that issue in a more comprehensive refactor.

The PR #1299 should fix the issue #1293.

I looked at #1299 but am not convinced it can fix the multi GPU issues. One thing I saw on a multi GPU setup, even nothing was running on GPUs, the base power could be different. Aggregating GPU power and distributing a process's utilization on one GPU across all GPUs is not a fair ratio model.

marceloamaral · 2024-03-14T12:10:33Z

@rootfs that is right.

Aggregating GPU power and distributing a process's utilization on one GPU across all GPUs is not a fair ratio model.

It works for idle power because it's a part of the node and should be divided among all processes, regardless of GPU usage.

However, for dynamic power, we need to separate it because processes' GPU utilization is specific to each GPU, different from the CPU case, where utilization is relative to the entire system.

The point here is that we should not bypass the Ratio Power model, but fix it.

marceloamaral · 2024-03-14T12:28:03Z

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go

+	for _, p := range processInfo {
+		computeUtilization := 0.0
+		if totalNormalizedUtil > 0 {
+			computeUtilization = processNormalizedUtil[p.Pid]/totalNormalizedUtil*100 + 0.5


As the computeUtilization is the Ratio, we should change the name to computeRatio
Also, we should not sum the + 0.5

Also, this is not really the Ratio.
The Ratio is the sum of the utilization of all MIG slices, but here we have the utilization of only one slice.

The totalNormalizedUtil is not the utilization of all MIG slices.

This function is listing all process running in the MIG slice, and then assuming that there is only one process running in the MIG slice, it associate the MIG utilization to the process.
The MIG utilization is normalized by its size to represent a fraction of the total GPU utilization.

So, for example, if there is only one process using in a MIG slices.
The totalNormalizedUtil = processNormalizedUtil and the ratio will be 100.5

It means that each MIG ratio will 1, and when multiplied by the GPU power it will have the total GPU power, not the fraction of the GPU power.

Additionally, we can't directly utilize the MIG utilization to determine the MIG power consumption. Instead, we must calculate the ratio of one MIG's utilization to the total utilization of all MIGs.

For instance, suppose a GPU has three slices: slice 1 uses 10% of the GPU, slice 2 uses 20%, and slice 3 is idle. The total GPU utilization is 30%. If we only consider utilization, we'd discard 70% of the power consumption.

marceloamaral · 2024-03-14T13:05:49Z

pkg/model/process_energy.go

+		klog.V(6).Infof("GPU %s: idle power: %v, dynamic power: %v\n", gpuName, processIdlePower, dynPower)
+		for pid, v := range processesMetrics {
+			v.EnergyUsage[config.IdleEnergyInGPU].SetDeltaStat(gpuName, processIdlePower)
+			usage := v.ResourceUsage[config.GPUComputeUtilization].GetDeltaValuesByKey(gpuName)


The usage is not really the usage here right?
This is the GPU ratio....So, we might need to change the metric name GPUComputeUtilization to GPUComputeRatio.

marceloamaral · 2024-03-14T13:33:42Z

pkg/model/process_energy.go

+		gpuName := fmt.Sprintf("%s%v", utils.GenericGPUID, gpuID)
+		numberOfProcesses := len(processIDList)
+		// calculate the gpu's processes energy consumption for each gpu
+		processIdlePower := nodeMetrics.EnergyUsage[config.IdleEnergyInGPU].GetDeltaValuesByKey(gpuID) / uint64(numberOfProcesses)


The idle power is a part of the node and should be divided among all processes, regardless of GPU usage.

So the process idle power should not be relative to only one GPU.

So, we just need to get here the Sum of the Idle power of all GPUs.
EnergyUsage[config.IdleEnergyInPkg].SumAllDeltaValues()

But note that the list of GPU in gpu.GetGpus(), should be only the Physical GPUs. If it also has the MIG slices, the SumAllDeltaValues will double count the idle power of all MIG slices, as it has the GPU power.
The PR #1299 should help with this since it keeps only the Physical GPUs in the gpu.GetGpus().

marceloamaral · 2024-03-14T13:42:04Z

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go

+		processAcceleratorMetrics[p.Pid] = ProcessUtilizationSample{
+			Pid:         p.Pid,
+			TimeStamp:   uint64(time.Now().UnixNano()),
+			ComputeUtil: uint32(computeUtilization),


Same here, change ComputeUtil to RatioUtil...

rootfs · 2024-03-26T13:31:39Z

@marceloamaral I am closing this PR for now. Please take priority in fixing the multi GPU ratio issue. Thanks

rootfs force-pushed the mig-formula branch 2 times, most recently from f0bbccd to a5207e8 Compare March 6, 2024 19:07

dcgm: when processes are running in MIGs, ratio their energy based ho…

a5207e8

…w much their normalized utitlizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU Signed-off-by: Huamin Chen <hchen@redhat.com>

rootfs requested a review from marceloamaral March 6, 2024 20:48

yuezhu1 reviewed Mar 6, 2024

View reviewed changes

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go Outdated Show resolved Hide resolved

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go Show resolved Hide resolved

marceloamaral reviewed Mar 7, 2024

View reviewed changes

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go Show resolved Hide resolved

marceloamaral reviewed Mar 7, 2024

View reviewed changes

pkg/sensors/accelerator/gpu/source/gpu_dcgm.go Outdated Show resolved Hide resolved

marceloamaral requested changes Mar 7, 2024

View reviewed changes

account processes' gpu power usage on each device based on the proces…

25e0815

…ses' compute utilization on these devices Signed-off-by: Huamin Chen <hchen@redhat.com>

fix lint

e7a4abc

Signed-off-by: Huamin Chen <hchen@redhat.com>

review feedback

2407b0d

Signed-off-by: Huamin Chen <hchen@redhat.com>

sthaha reviewed Mar 10, 2024

View reviewed changes

marceloamaral reviewed Mar 14, 2024

View reviewed changes

rootfs mentioned this pull request Mar 26, 2024

Fix listing and using the MIG device handler #1299

Merged

rootfs closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU #1292

dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU #1292

rootfs commented Mar 6, 2024 •

edited

marceloamaral left a comment

rootfs commented Mar 7, 2024

rootfs commented Mar 8, 2024 •

edited

sthaha Mar 10, 2024

rootfs Mar 11, 2024

marceloamaral commented Mar 13, 2024

rootfs commented Mar 13, 2024

marceloamaral commented Mar 14, 2024

marceloamaral Mar 14, 2024 •

edited

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

marceloamaral Mar 14, 2024

rootfs commented Mar 26, 2024

dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU #1292

dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU #1292

Conversation

rootfs commented Mar 6, 2024 • edited

marceloamaral left a comment

Choose a reason for hiding this comment

rootfs commented Mar 7, 2024

rootfs commented Mar 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marceloamaral commented Mar 13, 2024

rootfs commented Mar 13, 2024

marceloamaral commented Mar 14, 2024

marceloamaral Mar 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rootfs commented Mar 26, 2024

rootfs commented Mar 6, 2024 •

edited

rootfs commented Mar 8, 2024 •

edited

marceloamaral Mar 14, 2024 •

edited