Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU #1292

Closed
wants to merge 4 commits into from

Conversation

rootfs
Copy link
Contributor

@rootfs rootfs commented Mar 6, 2024

Fix #1293

@rootfs rootfs force-pushed the mig-formula branch 2 times, most recently from f0bbccd to a5207e8 Compare March 6, 2024 19:07
…w much their normalized utitlizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU

Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs rootfs changed the title dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU [DNM] dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU Mar 6, 2024
Copy link
Collaborator

@marceloamaral marceloamaral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs, it seems like you're suggesting duplicating the functionality of the ratio power model component.

Here's the ratio code: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/estimator/local/ratio.go#L84

That is, each process's GPU utilization will be aggregated to represent the node utilization. Subsequently, the Ratio power model will calculate the process power consumption based on the ratio of the process's GPU utilization and the node GPU utilization (the node GPU utilization is the totalNormalizedUtil that you are proposing here).

…ses' compute utilization on these devices

Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs
Copy link
Contributor Author

rootfs commented Mar 7, 2024

@rootfs, it seems like you're suggesting duplicating the functionality of the ratio power model component.

Here's the ratio code: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/estimator/local/ratio.go#L84

That is, each process's GPU utilization will be aggregated to represent the node utilization. Subsequently, the Ratio power model will calculate the process power consumption based on the ratio of the process's GPU utilization and the node GPU utilization (the node GPU utilization is the totalNormalizedUtil that you are proposing here).

@marceloamaral the current ratio code doesn't support multiple (GPU) devices yet, so this is the gap stopper before we get there.

Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs rootfs changed the title [DNM] dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU dcgm: when processes are running in MIGs, ratio their energy based how much their normalized utilizations are relative to the aggregate normalized utilization of only active MIGs, not the entire GPU Mar 7, 2024
Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs
Copy link
Contributor Author

rootfs commented Mar 8, 2024

@rootfs, it seems like you're suggesting duplicating the functionality of the ratio power model component.

Here's the ratio code: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/estimator/local/ratio.go#L84

That is, each process's GPU utilization will be aggregated to represent the node utilization. Subsequently, the Ratio power model will calculate the process power consumption based on the ratio of the process's GPU utilization and the node GPU utilization (the node GPU utilization is the totalNormalizedUtil that you are proposing here).

@marceloamaral right now, the Ratio model only has the process index, no process pid or utilization. It will take a big refactor to get there. For the GPU case, the ratio model code in this PR is a small change. Once we decide how to refactor the ratio model in the future, let's revisit the consolidation then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an update to bpf expected as part of this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the module gets recompiled each time, we should git ignore it. The Makefile rebuilds the module anyway.

@marceloamaral
Copy link
Collaborator

@rootfs the Ratio power model's logic involves calculating the ratio utilization of a process by the total utilization and multiplying it by the resource power consumption. So, with each process GPU utilization, total GPU utilization, and GPU power consumption, we can calculate the GPU power consumption for each process.

Instead of add a new power model logic inside the DCGM, we should just index the process GPU utilization with the parent GPU ID (not with the MIG ID).

The PR #1299 should fix the issue #1293.

@rootfs
Copy link
Contributor Author

rootfs commented Mar 13, 2024

@rootfs the Ratio power model's logic involves calculating the ratio utilization of a process by the total utilization and multiplying it by the resource power consumption. So, with each process GPU utilization, total GPU utilization, and GPU power consumption, we can calculate the GPU power consumption for each process.

Agreed and this is what the PR does: it gets the processes' per GPU utilization and uses that info to calculate power consumption on each GPU.

Instead of add a new power model logic inside the DCGM, we should just index the process GPU utilization with the parent GPU ID (not with the MIG ID).

The current ratio model based on aggregate usage on all devices are not applicable to GPUs and potentially also has issues with heterogenous CPUs that have performance and efficiency cores. We need to address that issue in a more comprehensive refactor.

The PR #1299 should fix the issue #1293.

I looked at #1299 but am not convinced it can fix the multi GPU issues. One thing I saw on a multi GPU setup, even nothing was running on GPUs, the base power could be different. Aggregating GPU power and distributing a process's utilization on one GPU across all GPUs is not a fair ratio model.

@marceloamaral
Copy link
Collaborator

@rootfs that is right.

Aggregating GPU power and distributing a process's utilization on one GPU across all GPUs is not a fair ratio model.

It works for idle power because it's a part of the node and should be divided among all processes, regardless of GPU usage.

However, for dynamic power, we need to separate it because processes' GPU utilization is specific to each GPU, different from the CPU case, where utilization is relative to the entire system.

The point here is that we should not bypass the Ratio Power model, but fix it.

for _, p := range processInfo {
computeUtilization := 0.0
if totalNormalizedUtil > 0 {
computeUtilization = processNormalizedUtil[p.Pid]/totalNormalizedUtil*100 + 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the computeUtilization is the Ratio, we should change the name to computeRatio
Also, we should not sum the + 0.5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is not really the Ratio.
The Ratio is the sum of the utilization of all MIG slices, but here we have the utilization of only one slice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The totalNormalizedUtil is not the utilization of all MIG slices.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is listing all process running in the MIG slice, and then assuming that there is only one process running in the MIG slice, it associate the MIG utilization to the process.
The MIG utilization is normalized by its size to represent a fraction of the total GPU utilization.

So, for example, if there is only one process using in a MIG slices.
The totalNormalizedUtil = processNormalizedUtil and the ratio will be 100.5

It means that each MIG ratio will 1, and when multiplied by the GPU power it will have the total GPU power, not the fraction of the GPU power.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, we can't directly utilize the MIG utilization to determine the MIG power consumption. Instead, we must calculate the ratio of one MIG's utilization to the total utilization of all MIGs.

For instance, suppose a GPU has three slices: slice 1 uses 10% of the GPU, slice 2 uses 20%, and slice 3 is idle. The total GPU utilization is 30%. If we only consider utilization, we'd discard 70% of the power consumption.

klog.V(6).Infof("GPU %s: idle power: %v, dynamic power: %v\n", gpuName, processIdlePower, dynPower)
for pid, v := range processesMetrics {
v.EnergyUsage[config.IdleEnergyInGPU].SetDeltaStat(gpuName, processIdlePower)
usage := v.ResourceUsage[config.GPUComputeUtilization].GetDeltaValuesByKey(gpuName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage is not really the usage here right?
This is the GPU ratio....So, we might need to change the metric name GPUComputeUtilization to GPUComputeRatio.

gpuName := fmt.Sprintf("%s%v", utils.GenericGPUID, gpuID)
numberOfProcesses := len(processIDList)
// calculate the gpu's processes energy consumption for each gpu
processIdlePower := nodeMetrics.EnergyUsage[config.IdleEnergyInGPU].GetDeltaValuesByKey(gpuID) / uint64(numberOfProcesses)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idle power is a part of the node and should be divided among all processes, regardless of GPU usage.

So the process idle power should not be relative to only one GPU.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we just need to get here the Sum of the Idle power of all GPUs.
EnergyUsage[config.IdleEnergyInPkg].SumAllDeltaValues()

But note that the list of GPU in gpu.GetGpus(), should be only the Physical GPUs. If it also has the MIG slices, the SumAllDeltaValues will double count the idle power of all MIG slices, as it has the GPU power.
The PR #1299 should help with this since it keeps only the Physical GPUs in the gpu.GetGpus().

processAcceleratorMetrics[p.Pid] = ProcessUtilizationSample{
Pid: p.Pid,
TimeStamp: uint64(time.Now().UnixNano()),
ComputeUtil: uint32(computeUtilization),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, change ComputeUtil to RatioUtil...

@rootfs
Copy link
Contributor Author

rootfs commented Mar 26, 2024

@marceloamaral I am closing this PR for now. Please take priority in fixing the multi GPU ratio issue. Thanks

@rootfs rootfs closed this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants