Kepler general energy metrics vs energy tracing metrics #365

williamcaban · 2022-11-07T17:47:32Z

Is your feature request related to a problem? Please describe.

The recent changes in the kind of metrics exposed increase the level of visibility to trace how much energy is consumed by which elements on a process/Pod/container. This kind of metrics is valuable for applications owners looking to optimize the energy consumption by their application or services stack. At the same time, the granularity of these metrics make it extremely difficult for functionalities that require a quick understanding of the system (e.g. schedulers, multi-cluster level dashboards, etc) and considerable increase the load on the Prometheus stack.

Describe the solution you'd like

I like to propose for Kepler to have a flag that can be use to influence the type of metrics that it exposes on the metric endpoint. Basically, enable Kepler to be used by general use cases (e.g. report the top-5 energy consumption by namespace, pods or apps; report the software+OS-view energy consumption by nodes, etc) and another mode for developers that want to do low level tracing of energy consumption of their apps.

The mode for the general use cases should provide a simple single metric per container or Pod, that can be used for user facing reporting and executive types of reports.

Describe alternatives you've considered
With the current granular metrics:

We have seen a significant impact on Prometheus utilization (way too many metrics in very short time even on micro environments)
It has become very challenging using the metrics of the general use cases as the query formula gets very long in not time, and the system takes several seconds to process it once
For comparison, when loading the metrics on DataFrames to run analytics in a Jupyter notebooks for an 8 nodes clusters with ~500 Pods, loading a 2hr dataset with 1 minutes resolution: The current granular metrics takes 2min 17s (as reported by %timeit) to load. With the previous aggregated metrics, the same cluster setup with the same datasets and resolution was loading in less than 1 second.

Additional context
The general use cases should also cover #301

The text was updated successfully, but these errors were encountered:

rootfs · 2022-11-07T18:19:58Z

thank you @williamcaban for the investigation and proposal

We have seen a significant impact on Prometheus utilization (way too many metrics in very short time even on micro environments)

For comparison, when loading the metrics on DataFrames to run analytics in a Jupyter notebooks for an 8 nodes clusters with ~500 Pods, loading a 2hr dataset with 1 minutes resolution: The current granular metrics takes 2min 17s (as reported by %timeit) to load. With the previous aggregated metrics, the same cluster setup with the same datasets and resolution was loading in less than 1 second.

This is surely a concern for us. If Prometheus cannot support a small setup, the metrics scheme needs a scalability study.

It has become very challenging using the metrics of the general use cases as the query formula gets very long in not time, and the system takes several seconds to process it once

This can be investigated in two ways: exposing the general use case through metrics or supporting them via Prometheus query. Again, these approaches are to be evaluated via usability and scalability factors.

rootfs · 2022-11-07T18:20:15Z

cc @marceloamaral

rootfs · 2022-11-07T18:40:09Z

Let's create a process for metrics schema:

Support general use cases: current and aggregate power usage by nodes, pods, and potentially namespace.
Focus on the scalability on evaluating design choices: what would happen to Kepler, Prometheus, or Grafana dashboard when the numbers of pods and nodes increase?
Add integration test on the quantifying metrics query latency.

@sustainable-computing-io/reviewer

marceloamaral · 2022-11-08T13:47:00Z

Right, I do agree that we need a metric that sum the energy consumption of all sources:
kepler_container_joules_total as the old pod_curr_energy_millijoule

This metric will represent the container_energy = package+ucore+dram+gpu+other_components

marceloamaral · 2022-11-08T13:49:17Z

Regarding the number of metrics. We have some ideas of disabling some metrics via configuration.
So we can do that in the future.

However, just keep in mind that node-exporter, cadvisor and other exporters export a lot more metrics than we do without major problems.
But of course we need to do some analysis on this.

rootfs · 2022-11-08T13:50:16Z

thank @williamcaban for sharing me the setup.
This setup is quite beefy

64 cores Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz

It has the latest metrics: kepler_pod_energy.

I ran some quick tests and did find quite some issues:

current CPU usage is way too high, it is over 150%
latency is also high, 4s vs 1s before refactoring

marceloamaral · 2022-11-08T14:52:46Z

@williamcaban regarding the large prometheus metric being too expensive to process, I created a new metric in PR #370

marceloamaral · 2022-11-08T14:54:29Z

@rootfs definitely there is something wrong with this deployment.
First, why it has 3 kepler instances?

In my environment the kepler CPU utilization is not high....

rootfs · 2022-11-08T15:12:01Z

the top output was from a remote tmux console, there are issues with rendering. This is the top from inside the kepler pod

marceloamaral · 2022-11-11T11:43:15Z

The empty cgroup values might be related to this fix (PR#377)

rootfs · 2022-11-11T18:54:44Z

@williamcaban @marceloamaral Here is my controlled test results.
https://gist.github.com/rootfs/2bb2c742c3afd9acca7811ed8eba7dd7

The TL;DR is, there is a scalability regression in the current code, kepler's CPU usage is very sensitive to the number of Pods.

rootfs · 2022-11-11T20:07:14Z

The scalability is introduced in ad9daed (#287). I tested the until 73a4291 and found no regression.

rootfs · 2022-11-11T20:35:34Z

looking at this block, there are quite some map to create and discard after use. Creating map and letting it go to garbage collection is quite expensive.

	containerMetricValuesOnly, containerIDList, containerGPUDelta := c.getContainerMetricsList()

	// use the container's resource usage metrics to update the node metrics
	nodeTotalPower, nodeTotalGPUPower, nodeTotalPowerPerComponents := c.updateNoderMetrics(containerMetricValuesOnly)

	// calculate the container energy consumption using its resource utilization and the node components energy consumption
	// TODO: minimize the number of collection variables, we already have the containerMetrics and NodeMetrics structures
	c.updateContainerEnergy(containerMetricValuesOnly, containerIDList, containerGPUDelta, nodeTotalPower, nodeTotalGPUPower, nodeTotalPowerPerComponents)

rootfs · 2022-11-11T20:54:39Z

Based on the profiling result, gc is an issue

@marceloamaral

marceloamaral · 2022-11-12T03:36:45Z

Thanks @rootfs for the excellent profile analysis.
Yes, the new code is using more CPU. Maybe it's because before we had the metrics aggregated by pod and now we have them by container, so it has more values in the maps....

looking at this block, there are quite some map to create and discard after use. Creating map and letting it go to garbage collection is quite expensive.

@rootfs I didn't create new maps I just refactored the ones that already exist to be transparent what we're using.
But performance regression may be related to some of these functions.
Before, everything was in just one function and I split it into several functions to improve code readability....

rootfs · 2022-11-12T17:03:20Z

@marceloamaral containerMetricValuesOnly, containerIDList, containerGPUDelta,nodeTotalPower, nodeTotalGPUPower, nodeTotalPowerPerComponents are created by functions. They are used once and threw away. This puts much pressure on gc.

jichenjc · 2022-11-14T01:05:15Z

The empty cgroup values might be related to this fix (PR#377)

you are thinking this is a regression?
I tested on my local CentOS 8 env , without fix, the cgroup value is 0 ..
so maybe some regression introduction in #377?

marceloamaral · 2022-11-14T08:43:31Z

containerMetricValuesOnly, containerIDList, containerGPUDelta,nodeTotalPower, nodeTotalGPUPower, nodeTotalPowerPerComponents are created by functions. They are used once and threw away. This puts much pressure on gc.

@rootfs I agree 100%. And that is why I included the TODO in the code to remove this...

These arrays are used in the model package because the model server only works with arrays.
To improve this we should use protobuf (for example) for the data and then the model server should also use this.
Using protobuf, we won't need to convert structured data into multiple arrays....

marceloamaral · 2022-11-15T11:30:35Z

Based on the CPU profile results, I also analyzed the MEM profile at #391.
In a first, initial analysis, the ListPod function seems to be the bottleneck.
And the getCPUCoreFrequency is also a very expensive function.

simonpasquier · 2022-11-25T13:56:28Z

👋 Prometheus team member here. Though I'm not in expert in client code instrumentation and I've only skimmed over the issue, I can probably share some information :)
While in general code instrumentation is lightweight in terms of resource usage (e.g. github.com/prometheus/client_golang is highly optimized), the custom collector pattern (using Const metrics) can be problematic, especially when a single exporter exposes tens of thousands of metrics: each scrape from Prometheus allocates lots of objects, putting pressure both on the memory and on the CPU (due to garbage collection).

Projects like kube-state-metrics encountered similar issues in the past (and kubelet/cAdvisor to some extent). In practice, kube-state-metrics alleviated the problem by implementing a custom Prometheus client library (see the metrics_store and metricshandler packages). kubernetes/kube-state-metrics#498 has also lots of details on why building a custom metric client was the right choice for the project.
This isn't me saying that kepler should take the same route :) For instance, there is some experiment in github.com/prometheus/client_golang (prometheus/client_golang#929 and prometheus/client_golang#989) and cAdvisor (kubernetes/kubernetes#104459, kubernetes/kubernetes#108194 and google/cadvisor#2974) to support metrics caching in order to reduce the resource overhead. It would be interesting to see if this is applicable for kepler too..

marceloamaral · 2022-11-29T12:26:36Z

Thanks @simonpasquier for the pointers!

marceloamaral · 2022-11-29T12:27:15Z

Btw, the PR #427 introduces a 2.4X performance improvement for Kepler regarding the CPU profiling.

stale · 2023-05-17T13:33:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rootfs mentioned this issue Nov 8, 2022

scalability and performance tests to evaluate kepler's resource usage and scrape latency under high and low workload #366

Closed

marceloamaral mentioned this issue Nov 8, 2022

Add new container metric with the sum of all energy consumption sources #370

Merged

marceloamaral mentioned this issue Nov 11, 2022

Add node name in the container metrics because of system_process #378

Closed

rootfs added the urgent-priority Issues that are part of the next release label Nov 11, 2022

rootfs mentioned this issue Nov 13, 2022

pod cache: cache system process #383

Merged

wangchen615 added the kind/feature New feature or request label Nov 14, 2022

wangchen615 added high-priority Issues that are part of the next release but not mandatory and removed urgent-priority Issues that are part of the next release labels Nov 15, 2022

sunya-ch mentioned this issue Nov 15, 2022

some cGroup available metrics are missing #396

Closed

marceloamaral mentioned this issue Nov 29, 2022

ebpf: Update CPU freq calculation to improve performance #427

Merged

stale bot added the wontfix This will not be worked on label May 17, 2023

stale bot closed this as completed May 24, 2023

simonpasquier mentioned this issue Oct 19, 2023

Add OpenTelemetry converter to export Prometheus metrics #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kepler general energy metrics vs energy tracing metrics #365

Kepler general energy metrics vs energy tracing metrics #365

williamcaban commented Nov 7, 2022

rootfs commented Nov 7, 2022

rootfs commented Nov 7, 2022

rootfs commented Nov 7, 2022

marceloamaral commented Nov 8, 2022

marceloamaral commented Nov 8, 2022

rootfs commented Nov 8, 2022

marceloamaral commented Nov 8, 2022

marceloamaral commented Nov 8, 2022

rootfs commented Nov 8, 2022 •

edited

marceloamaral commented Nov 11, 2022

rootfs commented Nov 11, 2022

rootfs commented Nov 11, 2022

rootfs commented Nov 11, 2022 •

edited

rootfs commented Nov 11, 2022

marceloamaral commented Nov 12, 2022

rootfs commented Nov 12, 2022

jichenjc commented Nov 14, 2022

marceloamaral commented Nov 14, 2022

marceloamaral commented Nov 15, 2022

simonpasquier commented Nov 25, 2022

marceloamaral commented Nov 29, 2022

marceloamaral commented Nov 29, 2022

stale bot commented May 17, 2023

Kepler general energy metrics vs energy tracing metrics #365

Kepler general energy metrics vs energy tracing metrics #365

Comments

williamcaban commented Nov 7, 2022

rootfs commented Nov 7, 2022

rootfs commented Nov 7, 2022

rootfs commented Nov 7, 2022

marceloamaral commented Nov 8, 2022

marceloamaral commented Nov 8, 2022

rootfs commented Nov 8, 2022

marceloamaral commented Nov 8, 2022

marceloamaral commented Nov 8, 2022

rootfs commented Nov 8, 2022 • edited

marceloamaral commented Nov 11, 2022

rootfs commented Nov 11, 2022

rootfs commented Nov 11, 2022

rootfs commented Nov 11, 2022 • edited

rootfs commented Nov 11, 2022

marceloamaral commented Nov 12, 2022

rootfs commented Nov 12, 2022

jichenjc commented Nov 14, 2022

marceloamaral commented Nov 14, 2022

marceloamaral commented Nov 15, 2022

simonpasquier commented Nov 25, 2022

marceloamaral commented Nov 29, 2022

marceloamaral commented Nov 29, 2022

stale bot commented May 17, 2023

rootfs commented Nov 8, 2022 •

edited

rootfs commented Nov 11, 2022 •

edited