Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cadvisor SIGSEGV: segmentation violation on v0.49.1 with perf enabled #3529

Open
dcathapermal opened this issue May 9, 2024 · 7 comments
Open

Comments

@dcathapermal
Copy link

Hi all,

Running with these specs:

Version: {KernelVersion:5.15.0-105-generic ContainerOsVersion:Alpine Linux v3.18 DockerVersion: DockerAPIVersion: CadvisorVersion:v0.49.1 CadvisorRevision:6f3f25ba}

I don't seem to have any issues spinning up a plain version of cadvisor v0.49.1.

docker run \
  --volume=/etc/configs/perf/perf.json:/etc/configs/perf/perf.json \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  --privileged \
  gcr.io/cadvisor/cadvisor:$VERSION \

However, when adding perf to the build --perf_events_config=/etc/configs/perf/perf.json, there seems to be a segfault error

I0509 19:13:15.652998       1 factory.go:279] Factory "containerd" was unable to handle container "/"
I0509 19:13:15.653009       1 factory.go:45] / not handled by systemd handler
I0509 19:13:15.653015       1 factory.go:279] Factory "systemd" was unable to handle container "/"
I0509 19:13:15.653039       1 factory.go:279] Factory "docker" was unable to handle container "/"
I0509 19:13:15.653048       1 factory.go:275] Using factory "raw" for container "/"
I0509 19:13:15.653301       1 collector_libpfm.go:445] Setting up perf event cycles

SIGSEGV: segmentation violation
PC=0x7f2ee1010a7c m=22 sigcode=128 addr=0x0
signal arrived during cgo execution

goroutine 1 gp=0xc0000061c0 m=22 mp=0xc000181808 [syscall]:
runtime.cgocall(0x1306840, 0xc000b86fe8)
	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000b86fc0 sp=0xc000b86f88 pc=0x40a78b
github.com/google/cadvisor/perf._Cfunc_free(0x7f2e99fb7fc0)
	_cgo_gotypes.go:95 +0x3f fp=0xc000b86fe8 sp=0xc000b86fc0 pc=0x92ccbf
github.com/google/cadvisor/perf.pfmGetOsEventEncoding.pfmGetOsEventEncoding.func1.func4()
	/go/src/github.com/google/cadvisor/perf/collector_libpfm.go:269 +0x35 fp=0xc000b87020 sp=0xc000b86fe8 pc=0x92f9f5
github.com/google/cadvisor/perf.pfmGetOsEventEncoding({0xc00026250a, 0x6}, 0x7f2e99fb1220)
	/go/src/github.com/google/cadvisor/perf/collector_libpfm.go:279 +0x214 fp=0xc000b870c8 sp=0xc000b87020 pc=0x92f854
github.com/google/cadvisor/perf.readPerfEventAttr({0xc00026250a, 0x6}, 0x1719a28)
	/go/src/github.com/google/cadvisor/perf/collector_libpfm.go:259 +0x63 fp=0xc000b870f8 sp=0xc000b870c8 pc=0x92f563
github.com/google/cadvisor/perf.(*collector).createConfigFromEvent(0x14a4980?, {0xc00026250a, 0x6})
	/go/src/github.com/google/cadvisor/perf/collector_libpfm.go:447 +0x10e fp=0xc000b87190 sp=0xc000b870f8 pc=0x93100e
github.com/google/cadvisor/perf.(*collector).createLeaderFileDescriptors(0xc000000600, {0xc0001a8e70?, 0x1, 0xc000000000?}, 0xc, 0x0, 0x10?)
	/go/src/github.com/google/cadvisor/perf/collector_libpfm.go:239 +0x1cc fp=0xc000b87298 sp=0xc000b87190 pc=0x92f26c
github.com/google/cadvisor/perf.(*collector).setup(0xc000000600)

Is this a known issue or bug with this version? I'm aware that older versions (v0.47.2 and before) do not come with perf built in, but it seems that v0.49.1 does, but it is unable to run?

@Rouzip
Copy link
Contributor

Rouzip commented May 11, 2024

Can you share ldd cadvisor output?

@dcathapermal
Copy link
Author

~/usr/bin# ldd cadvisor 
	linux-vdso.so.1 (0x00007ffdc295c000)
	libpfm.so.4 => /lib/x86_64-linux-gnu/libpfm.so.4 (0x00007fe0483bd000)
	libc.musl-x86_64.so.1 => not found
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe048194000)
	/lib/ld-musl-x86_64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007fe04864d000)

Also just a note, this is an upstream of cadvisor so we didn't rebuild the binary with any flags or anything. But it seems that libpfm4 is already integrated in this version's upstream

@Rouzip
Copy link
Contributor

Rouzip commented May 14, 2024

~/usr/bin# ldd cadvisor 
	linux-vdso.so.1 (0x00007ffdc295c000)
	libpfm.so.4 => /lib/x86_64-linux-gnu/libpfm.so.4 (0x00007fe0483bd000)
	libc.musl-x86_64.so.1 => not found
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe048194000)
	/lib/ld-musl-x86_64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007fe04864d000)

Also just a note, this is an upstream of cadvisor so we didn't rebuild the binary with any flags or anything. But it seems that libpfm4 is already integrated in this version's upstream

defer C.free(unsafe.Pointer(fstr))

This code cause the segment violation, not libpfm4 library. Can you check old version ldd output and readelf -s xxx | grep free output?

@dcathapermal
Copy link
Author

dcathapermal commented May 14, 2024

this is an ldd of the cadvisor executable container running v0.47.2

docker exec 322ea45f166fb8533ca917817c2758cc444587381bce95b6af844bc431714552 ldd /usr/bin/cadvisor
	/lib/ld-musl-x86_64.so.1 (0x7f5fae13a000)
	libipmctl.so.4 => /usr/local/lib/libipmctl.so.4 (0x7f5faddd3000)
	libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f5fae13a000)
	libndctl.so.6 => /usr/lib/libndctl.so.6 (0x7f5faddaa000)
	libdaxctl.so.1 => /usr/lib/libdaxctl.so.1 (0x7f5fadd9d000)
	libudev.so.1 => /lib/libudev.so.1 (0x7f5fadd7b000)
	libuuid.so.1 => /lib/libuuid.so.1 (0x7f5fadd72000)
	libkmod.so.2 => /lib/libkmod.so.2 (0x7f5fadd5b000)
	libzstd.so.1 => /usr/lib/libzstd.so.1 (0x7f5fadcdd000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x7f5fadcba000)
	libz.so.1 => /lib/libz.so.1 (0x7f5fadca0000)
	libcrypto.so.1.1 => /lib/libcrypto.so.1.1 (0x7f5fada1e000)

I can't do a readelf because the container does not come with it installed, but you can see v0.47.2 doesn't have libpfm4as a dependency. So it gives the error 1 manager_no_libpfm.go:29] cAdvisor is build without cgo and/or libpfm support. Perf event counters are not available. when trying to set the -perf_events_config flag using this older version.

Do you know what kernel version the gcr.io/cadvisor/cadvisor:v0.49.1 image was built on? We have a suspicion that our 5.15.0-105-generic is unable to support the perf counters if the binary was built on top of a 6.0 version''

EDIT: ^^This above issue is not the case. I tested the upstream gcr image on a 6.X machine and still have the same issue

@Rouzip
Copy link
Contributor

Rouzip commented May 16, 2024

~/usr/bin# ldd cadvisor 
	linux-vdso.so.1 (0x00007ffdc295c000)
	libpfm.so.4 => /lib/x86_64-linux-gnu/libpfm.so.4 (0x00007fe0483bd000)
	libc.musl-x86_64.so.1 => not found
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe048194000)
	/lib/ld-musl-x86_64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007fe04864d000)

Also just a note, this is an upstream of cadvisor so we didn't rebuild the binary with any flags or anything. But it seems that libpfm4 is already integrated in this version's upstream

I suspect the issue may be due to the use of the musl library not being found correctly, resulting in a SIGSEGV error.

@dcathapermal
Copy link
Author

dcathapermal commented May 16, 2024

Actually, I'm not sure which cadvisor binary I did ldd on in the first comment I replied to, but I no longer see libc.musl-x86_64.so.1 => not found when looking at the dependencies for v0.49.1. So I do not think it's a musl library issue..

/ # ldd /usr/bin/cadvisor
	/lib/ld-musl-x86_64.so.1 (0x7fd0d860c000)
	libpfm.so.4 => /usr/local/lib/libpfm.so.4 (0x7fd0d8337000)
	libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fd0d860c000)
/ # 

This is based on the docker build here:

docker run -it --rm   \
--volume=/etc/configs/perf/perf.json:/etc/configs/perf/perf.json  \
--volume=/:/rootfs:ro   \
--volume=/var/run:/var/run:rw  \
 --volume=/sys:/sys:ro  \
 --volume=/var/lib/docker/:/var/lib/docker:ro  \
--publish=8080:8080 \
  --name=cadvisor-debug   \
--privileged  \
 --entrypoint /bin/sh   \
gcr.io/cadvisor/cadvisor:v0.49.1

@Rouzip
Copy link
Contributor

Rouzip commented May 29, 2024

Actually, I'm not sure which cadvisor binary I did ldd on in the first comment I replied to, but I no longer see libc.musl-x86_64.so.1 => not found when looking at the dependencies for v0.49.1. So I do not think it's a musl library issue..

/ # ldd /usr/bin/cadvisor
	/lib/ld-musl-x86_64.so.1 (0x7fd0d860c000)
	libpfm.so.4 => /usr/local/lib/libpfm.so.4 (0x7fd0d8337000)
	libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fd0d860c000)
/ # 

This is based on the docker build here:

docker run -it --rm   \
--volume=/etc/configs/perf/perf.json:/etc/configs/perf/perf.json  \
--volume=/:/rootfs:ro   \
--volume=/var/run:/var/run:rw  \
 --volume=/sys:/sys:ro  \
 --volume=/var/lib/docker/:/var/lib/docker:ro  \
--publish=8080:8080 \
  --name=cadvisor-debug   \
--privileged  \
 --entrypoint /bin/sh   \
gcr.io/cadvisor/cadvisor:v0.49.1

I built cAdvisor on my local machine with glibc, and it ran successfully, so I suspect it may be a musl library link issue.

source build/config/libpfm4.sh
make build
ldd _output/cadvisor
       #linux-vdso.so.1 (0x00007ffc316d2000)
       #libpfm.so.4 => /usr/local/lib/libpfm.so.4 (0x000073cc9e000000)
       #libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000073cc9dc00000)
       #/lib64/ld-linux-x86-64.so.2 (0x000073cc9e4e8000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants