You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am aiming to include this project for my measurements in my thesis, but currently I don't get reliable data from Kepler.
Here is my Dashboard:
So the first row shows my PDU which measures the Watts. The left side is split per server, and the right is the sum over all servers.
The mean is around 150 Watts.
Kepler reaches in the stacked chart barely 45 Watts over all servers and therefore is missing over 100 Watts which don't get recorded?
Here is a picture using your dashboard:
What did you expect to happen?
I expect that Kepler reaches with some small error roughly the PDU measured power consumption. Of course, without the overhead of fans and so on.
As a side question:
You use sum by (pod_name, container_namespace) (irate(kepler_container_package_joules_total{container_namespace=~"$namespace", pod_name=~"$pod"}[1m])) as the query in the dashboard
But shouldn't the rate that you use be [$__rate_interval] instead of [1m]?
My query on the first screenshot would be then: sum by(pod_name, container_namespace) (rate(kepler_container_joules_total{container_namespace=~"$namespace", pod_name=~"$pod"}[$__rate_interval])) — I don't need the per computing type differentiation.
How can we reproduce it (as minimally and precisely as possible)?
Compare the power consumption on a smart electricity plug with the one that Kepler delivers.
Anything else we need to know?
Logs of the exporter:
libbpf: sec '.relkprobe/finish_task_switch': relo #23: insn #279 against 'task_clock' ││ libbpf: prog 'kprobe__finish_task_switch': found map 11 (task_clock, sec 13, off 352) for insn #279 ││ libbpf: sec '.relkprobe/finish_task_switch': relo #24: insn #285 against 'processes' ││ libbpf: prog 'kprobe__finish_task_switch': found map 0 (processes, sec 13, off 0) for insn #285 ││ libbpf: sec '.relkprobe/finish_task_switch': relo #25: insn #309 against 'processes' ││ libbpf: prog 'kprobe__finish_task_switch': found map 0 (processes, sec 13, off 0) for insn #309 ││ libbpf: sec '.relkprobe/finish_task_switch': relo #26: insn #341 against 'processes' ││ libbpf: prog 'kprobe__finish_task_switch': found map 0 (processes, sec 13, off 0) for insn #341 ││ libbpf: sec '.reltracepoint/irq/softirq_entry': collecting relocation for section(5) 'tracepoint/irq/softirq_entry' ││ libbpf: sec '.reltracepoint/irq/softirq_entry': relo #0: insn #6 against 'processes' ││ libbpf: prog 'kepler_irq_trace': found map 0 (processes, sec 13, off 0) for insn #6 ││ libbpf: sec '.relkprobe/mark_page_accessed': collecting relocation for section(7) 'kprobe/mark_page_accessed' ││ libbpf: sec '.relkprobe/mark_page_accessed': relo #0: insn #4 against 'processes' ││ libbpf: prog 'kprobe__mark_page_accessed': found map 0 (processes, sec 13, off 0) for insn #4 ││ libbpf: sec '.relkprobe/set_page_dirty': collecting relocation for section(9) 'kprobe/set_page_dirty' ││ libbpf: sec '.relkprobe/set_page_dirty': relo #0: insn #4 against 'processes' ││ libbpf: prog 'kprobe__set_page_dirty': found map 0 (processes, sec 13, off 0) for insn #4 ││ libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0 ││ libbpf: map 'processes': created successfully, fd=9 ││ libbpf: map 'pid_time': created successfully, fd=10 ││ libbpf: map 'cpu_cycles_event_reader': created successfully, fd=11 ││ libbpf: map 'cpu_cycles': created successfully, fd=12 ││ libbpf: map 'cpu_ref_cycles_event_reader': created successfully, fd=13 ││ libbpf: map 'cpu_ref_cycles': created successfully, fd=14 ││ libbpf: map 'cpu_instructions_event_reader': created successfully, fd=15 ││ libbpf: map 'cpu_instructions': created successfully, fd=16 ││ libbpf: map 'cache_miss_event_reader': created successfully, fd=17 ││ libbpf: map 'cache_miss': created successfully, fd=18 ││ libbpf: map 'task_clock_ms_event_reader': created successfully, fd=19 ││ libbpf: map 'task_clock': created successfully, fd=20 ││ libbpf: map 'cpu_freq_array': created successfully, fd=21 ││ libbpf: map 'amd64_ke.data': created successfully, fd=22 ││ libbpf: map 'amd64_ke.bss': created successfully, fd=23 ││ libbpf: sec 'kprobe/finish_task_switch': found 2 CO-RE relocations ││ libbpf: CO-RE relocating [58] struct pt_regs: found target candidate [174] struct pt_regs in [vmlinux] ││ libbpf: prog 'kprobe__finish_task_switch': relo #0: <byte_off> [58] struct pt_regs.di (0:14 @ offset 112) ││ libbpf: prog 'kprobe__finish_task_switch': relo #0: matching candidate #0 <byte_off> [174] struct pt_regs.di (0:14 @ offset 112) ││ libbpf: prog 'kprobe__finish_task_switch': relo #0: patched insn #15 (LDX/ST/STX) off 112 -> 112 ││ libbpf: CO-RE relocating [62] struct task_struct: found target candidate [130] struct task_struct in [vmlinux] ││ libbpf: prog 'kprobe__finish_task_switch': relo #1: <byte_off> [62] struct task_struct.tgid (0:86 @ offset 2780) ││ libbpf: prog 'kprobe__finish_task_switch': relo #1: matching candidate #0 <byte_off> [130] struct task_struct.tgid (0:76 @ offset 2500) ││ libbpf: prog 'kprobe__finish_task_switch': relo #1: patched insn #16 (ALU/ALU64) imm 2780 -> 2500 ││ libbpf: sec 'tracepoint/irq/softirq_entry': found 1 CO-RE relocations ││ libbpf: CO-RE relocating [405] struct trace_event_raw_softirq: found target candidate [15895] struct trace_event_raw_softirq in [vmlinux] ││ libbpf: prog 'kepler_irq_trace': relo #0: <byte_off> [405] struct trace_event_raw_softirq.vec (0:1 @ offset 12) ││ libbpf: prog 'kepler_irq_trace': relo #0: matching candidate #0 <byte_off> [15895] struct trace_event_raw_softirq.vec (0:1 @ offset 8) ││ libbpf: prog 'kepler_irq_trace': relo #0: patched insn #3 (LDX/ST/STX) off 12 -> 8 ││ libbpf: prog 'kprobe__finish_task_switch': failed to create kprobe 'finish_task_switch+0x0' perf event: No such file or directory ││ I0412 14:38:08.075554 1059339 libbpf_attacher.go:128] failed to attach kprobe/finish_task_switch: failed to attach finish_task_switch k(ret)probe to program kprobe__finish_task_switch: no such file or directory. Try finish_task_switch.isra.0 ││ I0412 14:38:08.130177 1059339 libbpf_attacher.go:195] Successfully load eBPF module from libbpf object ││ I0412 14:38:08.130317 1059339 process_energy.go:114] Using the Ratio/DynPower Power Model to estimate Process Platform Power ││ I0412 14:38:08.130338 1059339 process_energy.go:115] Process feature names: [cpu_instructions] ││ I0412 14:38:08.130428 1059339 process_energy.go:124] Using the Ratio/DynPower Power Model to estimate Process Component Power ││ I0412 14:38:08.130452 1059339 process_energy.go:125] Process feature names: [cpu_instructions cpu_instructions cache_miss gpu_compute_util] ││ I0412 14:38:08.130889 1059339 node_platform_energy.go:52] Using the Regressor/AbsPower Power Model to estimate Node Platform Power ││ I0412 14:38:08.131242 1059339 exporter.go:265] starting to listen on 0.0.0.0:9102 ││ I0412 14:38:08.131272 1059339 exporter.go:271] Started Kepler in 242.579785ms
Tobias-Pe
changed the title
Huge difference between Kepler Power Consumption and real PDU consumption
Huge difference between Kepler power consumption and real PDU power consumption
Apr 12, 2024
What happened?
Hi,
I am aiming to include this project for my measurements in my thesis, but currently I don't get reliable data from Kepler.
Here is my Dashboard:
So the first row shows my PDU which measures the Watts. The left side is split per server, and the right is the sum over all servers.
The mean is around 150 Watts.
Kepler reaches in the stacked chart barely 45 Watts over all servers and therefore is missing over 100 Watts which don't get recorded?
Here is a picture using your dashboard:
What did you expect to happen?
I expect that Kepler reaches with some small error roughly the PDU measured power consumption. Of course, without the overhead of fans and so on.
As a side question:
You use
sum by (pod_name, container_namespace) (irate(kepler_container_package_joules_total{container_namespace=~"$namespace", pod_name=~"$pod"}[1m]))
as the query in the dashboardBut shouldn't the rate that you use be
[$__rate_interval]
instead of[1m]
?My query on the first screenshot would be then:
sum by(pod_name, container_namespace) (rate(kepler_container_joules_total{container_namespace=~"$namespace", pod_name=~"$pod"}[$__rate_interval]))
— I don't need the per computing type differentiation.How can we reproduce it (as minimally and precisely as possible)?
Compare the power consumption on a smart electricity plug with the one that Kepler delivers.
Anything else we need to know?
Logs of the exporter:
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Installed using your guide: https://sustainable-computing.io/installation/kepler/
Kepler deployment config
For on kubernetes:
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: