Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] govc metric.ls panic #2835

Closed
jcpowermac opened this issue May 5, 2022 · 14 comments · Fixed by #2859
Closed

[BUG] govc metric.ls panic #2835

jcpowermac opened this issue May 5, 2022 · 14 comments · Fixed by #2859

Comments

@jcpowermac
Copy link

Describe the bug

When running govc metric.ls the command panics

To Reproduce
Steps to reproduce the behavior:

  1. run govc metric.ls against a virtual machine

Expected behavior

No panic

Affected version
Multiple versions: 0.27.4, 0.28.0

Screenshots/Debug Output

➜  ~ ./govc metric.ls -debug "$vm"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc939bf]

goroutine 1 [running]:
github.com/vmware/govmomi/performance.groupPerfCounterInfo.Less({0xc0004e8510?, {0xc000222000?, 0xc000543b68?, 0x6d000c0001ee380?}}, 0x4?, 0x3e?)
        /home/runner/work/govmomi/govmomi/performance/manager.go:208 +0x5f
sort.medianOfThree({0x13cae28, 0xc00053e700}, 0x203000?, 0x203000?, 0xc00053e700?)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:89 +0x42
sort.doPivot({0x13cae28, 0xc00053e700}, 0x0, 0xb8)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:114 +0x72
sort.quickSort({0x13cae28, 0xc00053e700}, 0xc0000d5ce8?, 0xc00053e700?, 0xc0000d5ce8?)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:203 +0x6c
sort.Sort({0x13cae28, 0xc00053e700})
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:231 +0x53
github.com/vmware/govmomi/performance.(*Manager).AvailableMetric(0xc0001ee230, {0x13cb050, 0xc00012e000}, {{0xc00014a220?, 0x1?}, {0xc00014a243?, 0xc9a3eb?}}, 0x14)
        /home/runner/work/govmomi/govmomi/performance/manager.go:253 +0x29a
github.com/vmware/govmomi/govc/metric.(*ls).Run(0xc000152560, {0x13cb050, 0xc00012e000}, 0x9?)
        /home/runner/work/govmomi/govmomi/govc/metric/ls.go:178 +0x136
github.com/vmware/govmomi/govc/cli.Run({0xc00012c050, 0x3, 0x3})
        /home/runner/work/govmomi/govmomi/govc/cli/command.go:165 +0x32d
main.main()
        /home/runner/work/govmomi/govmomi/govc/main.go:113 +0x4f
➜  ~

Additional context

This issue started after our VMC SDDC was upgraded:
SDDC Version: | 1.16v7

https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vmc-aws-operations/GUID-52CED8FB-2E3A-4766-8C59-2EAD8E2C1D31.html

vCenter:
Version: 7.0.3 Build: 19504857

We have been running this command in CI for 9 months without issue.
https://github.com/openshift/release/pull/20598/files

@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2022

Howdy 🖐   jcpowermac ! Thank you for your interest in this project. We value your feedback and will respond soon.

If you want to contribute to this project, please make yourself familiar with the CONTRIBUTION guidelines.

@dougm
Copy link
Member

dougm commented May 6, 2022

Not sure what's causing this, I haven't used vmc myself. Could include the debug logs or -trace stderr from:

./govc metric.ls -trace "$vm"

@jcpowermac
Copy link
Author

@dougm attached

trace.log
debug.tar.gz

@jcpowermac
Copy link
Author

https://github.com/vmware/govmomi/blob/master/performance/manager.go#L208 - is nil with no check and why we are panicing.

</soapenv:Body>
</soapenv:Envelope>(string) (len=2) "i "
(int) 0
(string) (len=2) "j "
(int) 28
(string) (len=11) "i counterid"
(int32) 378
(string) (len=11) "j counterid"
(int32) 395
(string) (len=13) "length d.info"
(int) 583

(*types.PerfCounterInfo)(<nil>)

2022-05-16T12:03:42-07:00 error layer=debugger error loading binary "/lib64/libc.so.6": could not parse .eh_frame section: pointer encoding not supported 0x9b at 0x59f4
2022-05-16T12:03:43-07:00 error layer=debugger error loading binary "/lib64/libnss_myhostname.so.2": could not parse .eh_frame section: pointer encoding not supported 0x9b at 0x238
2022-05-16T12:03:43-07:00 error layer=debugger error loading binary "/lib64/libnss_resolve.so.2": could not parse .eh_frame section: pointer encoding not supported 0x9b at 0x1b4
> [unrecovered-panic] runtime.fatalpanic() /usr/lib/golang/src/runtime/panic.go:1190 (hits goroutine(1):1 total:1) (PC: 0x437f20)
Warning: debugging optimized function
        runtime.curg._panic.arg: interface {}(string) "runtime error: invalid memory address or nil pointer dereference"
  1185: // fatalpanic implements an unrecoverable panic. It is like fatalthrow, except
  1186: // that if msgs != nil, fatalpanic also prints panic messages and decrements
  1187: // runningPanicDefers once main is blocked from exiting.
  1188: //
  1189: //go:nosplit
=>1190: func fatalpanic(msgs *_panic) {
  1191:         pc := getcallerpc()
  1192:         sp := getcallersp()
  1193:         gp := getg()
  1194:         var docrash bool
  1195:         // Switch to the system stack to avoid any stack growth, which
(dlv) quit
➜  govc git:(master) ✗

@jcpowermac
Copy link
Author

jcpowermac commented May 16, 2022

This is an odd issue... using spew

func (d groupPerfCounterInfo) Less(i, j int) bool {
        spew.Printf("i: %#+v\n", i)
        spew.Printf("j: %#+v\n", j)

        ci := d.ids[i].CounterId
        spew.Printf("ci: %#+v\n", ci)
        cj := d.ids[j].CounterId
        spew.Printf("cj: %#+v\n", cj)

        spew.Printf("len(d.ids): %#+v\n", len(d.ids))
        spew.Printf("len(d.info): %#+v\n", len(d.info))

        for loc, value := range d.info {
                if value != nil {
                        if loc == ci || loc == cj {
                        spew.Printf("loc: %#+v value: %#+v\n", loc, value)
                }
                }

        }

        gi := d.info[ci].GroupInfo.GetElementDescription()
        gj := d.info[cj].GroupInfo.GetElementDescription()

        return gi.Key < gj.Key
}
i: (int)0
j: (int)28
ci: (int32)378
cj: (int32)395
len(d.ids): (int)228
len(d.info): (int)583
loc: (int32)395 value: (*types.PerfCounterInfo)(0xc0000dcf50){DynamicData:(types.DynamicData){} Key:(int32)395 NameInfo:(*types.ElementDescription)(0xc000404cf0){Description:(types.Description){DynamicData:(types.DynamicData){} Label:(string)Run Summary:(string)Time the virtual machine is scheduled to run} Key:(string)run} GroupInfo:(*types.ElementDescription)(0xc000404d20){Description:(types.Description){DynamicData:(types.DynamicData){} Label:(string)CPU Summary:(string)CPU} Key:(string)cpu} UnitInfo:(*types.ElementDescription)(0xc000404d50){Description:(types.Description){DynamicData:(types.DynamicData){} Label:(string)ms Summary:(string)Millisecond} Key:(string)millisecond} RollupType:(types.PerfSummaryType)summation StatsType:(types.PerfStatsType)delta Level:(int32)2 PerDeviceLevel:(int32)3 AssociatedCounterId:([]int32)<nil>}

The ci CounterID location within d.info is nil but it really shouldn't be.

@yusufozturk
Copy link

We see similar issue with some virtual machines.

sort.sort functions panics when it reaches to doPivot function. It looks like a golang issue. I believe this started happening to us after 1.18, because we don't remember if it was happening before.

Do you have 1.18 as well? @jcpowermac

@jcpowermac
Copy link
Author

Do you have 1.18 as well? @jcpowermac

➜  ~ go version
go version go1.16.14 linux/amd64

@yusufozturk
Copy link

Sorry I misread the situation. We have the issue on manager.go:253.

image

We did a workaround by using Manager.Sort = false, but the issue is still there. I will create another issue for this.

@jcpowermac
Copy link
Author

I forgot to mention that our other CI enviornment running vCenter - Version: 7.0.2 Build: 17920168
does not have this issue.

@jcpowermac
Copy link
Author

Reproduced in non-vmc vcenter as well
Version: 7.0.3 Build: 19234570

root@jump-002-dallas-rh ~]# govc metric.ls /VMware-On-IBM-Cloud-VPC/vm/vcenter
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc939ea]

goroutine 1 [running]:
github.com/vmware/govmomi/performance.groupPerfCounterInfo.Less({0xc000190630?, {0xc0003e2000?, 0x0?, 0x1f9?}}, 0x0?, 0xfd?)
        /home/runner/work/govmomi/govmomi/performance/manager.go:209 +0x8a
sort.doPivot({0x13cae28, 0xc000199960}, 0x0, 0x1fa)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:136 +0x23c
sort.quickSort({0x13cae28, 0xc000199960}, 0xc00043dce8?, 0xc000199960?, 0xc00043dce8?)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:203 +0x6c
sort.Sort({0x13cae28, 0xc000199960})
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:231 +0x53
github.com/vmware/govmomi/performance.(*Manager).AvailableMetric(0xc00037e000, {0x13cb050, 0xc00009e000}, {{0xc000036720?, 0x1?}, {0xc00003673b?, 0xc9a3eb?}}, 0x14)
        /home/runner/work/govmomi/govmomi/performance/manager.go:253 +0x29a
github.com/vmware/govmomi/govc/metric.(*ls).Run(0xc0000ca560, {0x13cb050, 0xc00009e000}, 0x9?)
        /home/runner/work/govmomi/govmomi/govc/metric/ls.go:178 +0x136
github.com/vmware/govmomi/govc/cli.Run({0xc000090070, 0x2, 0x2})
        /home/runner/work/govmomi/govmomi/govc/cli/command.go:165 +0x32d
main.main()
        /home/runner/work/govmomi/govmomi/govc/main.go:113 +0x4f
[root@jump-002-dallas-rh ~]#

@jcpowermac
Copy link
Author

And then upgraded to u3e
Build: 19717403

[root@jump-002-dallas-rh ~]# govc metric.ls /VMware-On-IBM-Cloud-VPC/vm/vcenter
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc939ea]

goroutine 1 [running]:
github.com/vmware/govmomi/performance.groupPerfCounterInfo.Less({0xc0000f59e0?, {0xc000490000?, 0x0?, 0x1f9?}}, 0x0?, 0xfd?)
        /home/runner/work/govmomi/govmomi/performance/manager.go:209 +0x8a
sort.doPivot({0x13cae28, 0xc0001261e0}, 0x0, 0x1fa)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:136 +0x23c
sort.quickSort({0x13cae28, 0xc0001261e0}, 0xc0003b5ce8?, 0xc0001261e0?, 0xc0003b5ce8?)
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:203 +0x6c
sort.Sort({0x13cae28, 0xc0001261e0})
        /opt/hostedtoolcache/go/1.18.0/x64/src/sort/sort.go:231 +0x53
github.com/vmware/govmomi/performance.(*Manager).AvailableMetric(0xc0001a4460, {0x13cb050, 0xc000036080}, {{0xc0001d6a50?, 0x1?}, {0xc0001d6a6b?, 0xc9a3eb?}}, 0x14)
        /home/runner/work/govmomi/govmomi/performance/manager.go:253 +0x29a
github.com/vmware/govmomi/govc/metric.(*ls).Run(0xc000074580, {0x13cb050, 0xc000036080}, 0x9?)
        /home/runner/work/govmomi/govmomi/govc/metric/ls.go:178 +0x136
github.com/vmware/govmomi/govc/cli.Run({0xc0000300a0, 0x2, 0x2})
        /home/runner/work/govmomi/govmomi/govc/cli/command.go:165 +0x32d
main.main()
        /home/runner/work/govmomi/govmomi/govc/main.go:113 +0x4f
[root@jump-002-dallas-rh ~]# 

@dougm
Copy link
Member

dougm commented May 26, 2022

I can reproduce this. Not seeing an issue with Go's sort itself @yusufozturk , but what @jcpowermac is seeing. Where of the 163 available metrics for a vm, 1 of them does not have PerfCounterInfo. I can see from @jcpowermac 's logs and mine, vCenter is not returning that 1 PerfCounterInfo, which seems like a vCenter bug. But something we can workaround in govmomi for now to avoid the panic, I'll try to put something together soon.

@jcpowermac
Copy link
Author

@dougm thank you for confirming.

dougm added a commit to dougm/govmomi that referenced this issue May 27, 2022
Newer versions of vCenter may return a CounterId via QueryAvailablePerfMetric()
which does not have a corresponding PerfCounterInfo entry in PerformanceManager's
perfCounter property. This results in a panic by a few of the govc metric commands
and any application using performance.Manager.AvailableMetric with Sort enabled.

While this may be a bug in vCenter, it is included in some recent release versions.
This workaround avoids the panic by ignoring any CounterId w/o a PerfCounterInfo.

Closes vmware#2835
@dougm
Copy link
Member

dougm commented May 27, 2022

@jcpowermac I'll open a bug against vCenter for this, a CounterId without any PerfCounterInfo is not usable. I could only see this with type VirtualMachine and the default (realtime/20s) interval, history intervals (`-i day|week|month|year) don't include this phantom CounterId. In the meantime, we can workaround with #2859

dougm added a commit to dougm/govmomi that referenced this issue May 27, 2022
Newer versions of vCenter may return a CounterId via QueryAvailablePerfMetric()
which does not have a corresponding PerfCounterInfo entry in PerformanceManager's
perfCounter property. This results in a panic by a few of the govc metric commands
and any application using performance.Manager.AvailableMetric with Sort enabled.

While this may be a bug in vCenter, it is included in some recent release versions.
This workaround avoids the panic by ignoring any CounterId w/o a PerfCounterInfo.

Also updated vcsim's PerformanceManager.QueryCounter to match real vCenter's behavior.
Where null is returned for an unknown CounterId.

Closes vmware#2835
dougm added a commit to dougm/govmomi that referenced this issue Jun 2, 2022
Newer versions of vCenter may return a CounterId via QueryAvailablePerfMetric()
which does not have a corresponding PerfCounterInfo entry in PerformanceManager's
perfCounter property. This results in a panic by a few of the govc metric commands
and any application using performance.Manager.AvailableMetric with Sort enabled.

While this may be a bug in vCenter, it is included in some recent release versions.
This workaround avoids the panic by ignoring any CounterId w/o a PerfCounterInfo.

Also updated vcsim's PerformanceManager.QueryCounter to match real vCenter's behavior.
Where null is returned for an unknown CounterId.

Closes vmware#2835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants