Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crio crashed with a concurrent map read and map write race condition #6321

Closed
Ezetowers opened this issue Oct 17, 2022 · 8 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Ezetowers
Copy link

Ezetowers commented Oct 17, 2022

What happened?

Crio crashed with the following panic

Oct 17 16:50:01 den-r25-u03 crio[5299]: fatal error: concurrent map read and map write
Oct 17 16:50:01 den-r25-u03 crio[5299]: goroutine 4359466 [running]:
Oct 17 16:50:01 den-r25-u03 crio[5299]: runtime.throw({0x557161bfc6b2, 0xc0006622e0})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /usr/lib64/go-1.17/src/runtime/panic.go:1198 +0x71 fp=0xc0042ab218 sp=0xc0042ab1e8 pc=0x5571604e8a31
Oct 17 16:50:01 den-r25-u03 crio[5299]: runtime.mapaccess2_faststr(0x7f5e602f0fa0, 0x0, {0xc000c1e9a0, 0x6b})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /usr/lib64/go-1.17/src/runtime/map_faststr.go:116 +0x3d4 fp=0xc0042ab280 sp=0xc0042ab218 pc=0x5571604c5d94
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/containers/storage/drivers/quota.(*Control).fsDiskQuotaFromPath(0xc0039aadc0, {0xc000c1e9a0, 0x6b})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/github.com/containers/storage/drivers/quota/projectquota.go:270 +0xb2 fp=0xc0042ab3d8 sp=0xc0042ab280 pc=0x557160cbcad2
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/containers/storage/drivers/quota.(*Control).GetDiskUsage(0xc002bec6c0, {0xc000c1e9a0, 0x40}, 0xc0022d0740)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/github.com/containers/storage/drivers/quota/projectquota.go:257 +0x3e fp=0xc0042ab550 sp=0xc0042ab3d8 pc=0x557160cbc95e
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/containers/storage/drivers/overlay.(*Driver).ReadWriteDiskUsage(0xc002bec6c0, {0xc002953cab, 0x40})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/github.com/containers/storage/drivers/overlay/overlay_cgo.go:17 +0x72 fp=0xc0042ab5a8 sp=0xc0042ab550 pc=0x557160cda132
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/internal/lib/stats.(*StatsServer).writableLayerForContainer(0xc000626e40, 0xc000513180)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/internal/lib/stats/stats_server.go:193 +0x1af fp=0xc0042ab668 sp=0xc0042ab5a8 pc=0x557161b1402f
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/internal/lib/stats.(*StatsServer).populateWritableLayer(0x100557160c1e426, 0xc0037349f0, 0xc00005a028)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/internal/lib/stats/stats_server.go:174 +0x25 fp=0xc0042ab6b8 sp=0xc0042ab668 pc=0x557161b13de5
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/internal/lib/stats.(*StatsServer).updateContainer(0xc000626e40, 0xc000513180, 0xc0017b7340)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/internal/lib/stats/stats_server.go:147 +0x195 fp=0xc0042ab748 sp=0xc0042ab6b8 pc=0x557161b13bf5
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/internal/lib/stats.(*StatsServer).statsForContainer(0xc000626e78, 0xc000a9c380, 0x10000c000e08640)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/internal/lib/stats/stats_server.go:322 +0x7b fp=0xc0042ab780 sp=0xc0042ab748 pc=0x557161b154db
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/internal/lib/stats.(*StatsServer).StatsForContainers(0xc000626e40, {0xc00338c108, 0x1, 0xc0042ab8d0})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/internal/lib/stats/stats_server.go:311 +0x185 fp=0xc0042ab878 sp=0xc0042ab780 pc=0x557161b15205
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/server.(*Server).ListContainerStats(0xc00040a000, {0x557162580980, 0xc0048ca150}, 0xc0034638a0)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/server/container_stats_list.go:30 +0x149 fp=0xc0042ab918 sp=0xc0042ab878 pc=0x557161b7d8a9
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/server/cri/v1alpha2.(*service).ListContainerStats(0xc0048ca150, {0x557162580980, 0xc0048ca150}, 0xc0042ab9f0)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/server/cri/v1alpha2/rpc_list_container_stats.go:14 +0x26 fp=0xc0042ab948 sp=0xc0042ab918 pc=0x557161ba4ca6
Oct 17 16:50:01 den-r25-u03 crio[5299]: k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_ListContainerStats_Handler.func1({0x557162580980, 0xc0048ca150}, {0x55716242dfe0, 0xc0034638a0})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:9226 +0x78 fp=0xc0042ab988 sp=0xc0042ab948 pc=0x557161a32098
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/internal/log.UnaryInterceptor.func1({0x557162580980, 0xc0048ca0c0}, {0x55716242dfe0, 0xc0034638a0}, 0xc00352ba40, 0xc0038cc030)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/internal/log/interceptors.go:56 +0xc3 fp=0xc0042aba10 sp=0xc0042ab988 pc=0x5571612b3523
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x557162580980, 0xc0048ca0c0}, {0x55716242dfe0, 0xc0034638a0})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a fp=0xc0042aba50 sp=0xc0042aba10 pc=0x557161ba599a
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/cri-o/cri-o/server/metrics.UnaryInterceptor.func1({0x557162580980, 0xc0048ca0c0}, {0x55716242dfe0, 0xc0034638a0}, 0xc001ac6000, 0xc001ac60e0)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/server/metrics/interceptors.go:24 +0xb2 fp=0xc0042abad8 sp=0xc0042aba50 pc=0x55716196d5f2
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x557162580980, 0xc0048ca0c0}, {0x55716242dfe0, 0xc0034638a0})
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a fp=0xc0042abb18 sp=0xc0042abad8 pc=0x557161ba599a
Oct 17 16:50:01 den-r25-u03 crio[5299]: github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x557162580980, 0xc0048ca0c0}, {0x55716242dfe0, 0xc0034638a0}, 0xc00352bbb8, 0x55716227e260)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0xbf fp=0xc0042abb70 sp=0xc0042abb18 pc=0x557161ba583f
Oct 17 16:50:01 den-r25-u03 crio[5299]: k8s.io/cri-api/pkg/apis/runtime/v1alpha2._RuntimeService_ListContainerStats_Handler({0x5571624b8640, 0xc0005a8278}, {0x557162580980, 0xc0048ca0c0}, 0xc001384060, 0xc000738b70)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go:9228 +0x138 fp=0xc0042abbc8 sp=0xc0042abb70 pc=0x557161a31f58
Oct 17 16:50:01 den-r25-u03 crio[5299]: google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001fc8c0, {0x5571625ba668, 0xc000346c00}, 0xc001013440, 0xc0004e35f0, 0x5571635a7d48, 0x0)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/google.golang.org/grpc/server.go:1282 +0xccf fp=0xc0042abe48 sp=0xc0042abbc8 pc=0x557160edd78f
Oct 17 16:50:01 den-r25-u03 crio[5299]: google.golang.org/grpc.(*Server).handleStream(0xc0001fc8c0, {0x5571625ba668, 0xc000346c00}, 0xc001013440, 0x0)
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/google.golang.org/grpc/server.go:1616 +0xa2a fp=0xc0042abf68 sp=0xc0042abe48 pc=0x557160ee156a
Oct 17 16:50:01 den-r25-u03 crio[5299]: google.golang.org/grpc.(*Server).serveStreams.func1.2()
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/google.golang.org/grpc/server.go:921 +0x98 fp=0xc0042abfe0 sp=0xc0042abf68 pc=0x557160edb2b8
Oct 17 16:50:01 den-r25-u03 crio[5299]: runtime.goexit()
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /usr/lib64/go-1.17/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0042abfe8 sp=0xc0042abfe0 pc=0x55716051c921
Oct 17 16:50:01 den-r25-u03 crio[5299]: created by google.golang.org/grpc.(*Server).serveStreams.func1
Oct 17 16:50:01 den-r25-u03 crio[5299]:         /home/abuild/rpmbuild/BUILD/cri-o-1.23.3/vendor/google.golang.org/grpc/server.go:919 +0x294

What did you expect to happen?

Crio not to crash with a race condition

How can we reproduce it (as minimally and precisely as possible)?

Since this is a race condition, I cannot provide steps to reproduce the issue

Anything else we need to know?

The problem seems to be associated with lack of locks in the Project Quota file which used by cri-o as a dependency. The functions that let the user know the value of XFS quotas does not seem to be thread safe. An issue was created in the library too

CRI-O and Kubernetes version

# crio --version
crio version 1.23.3
Version:          1.23.3
GitCommit:        2cf9dbf1a447d25931008c1815de407d9e5749f7
GitTreeState:     clean
BuildDate:        2022-07-29T04:25:57Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        exclude_graphdriver_devicemapper, seccomp
SeccompEnabled:   true
AppArmorEnabled:  false
# kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"c6939792865ef0f70f92006081690d77411c8ed5", GitTreeState:"clean", BuildDate:"2022-09-21T12:20:29Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12", GitCommit:"c6939792865ef0f70f92006081690d77411c8ed5", GitTreeState:"clean", BuildDate:"2022-09-21T12:13:07Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

OS version

# cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.6"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.6"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:6:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.6
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.6

# uname -a
Linux den-r25-u03 4.18.0-372.26.1.0.1.el8_6.x86_64 #1 SMP Tue Sep 13 21:44:27 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

@Ezetowers Ezetowers added the kind/bug Categorizes issue or PR as related to a bug. label Oct 17, 2022
@haircommander
Copy link
Member

in the stacktrace, does it show the other routine? I am only able to see one (though I may be misreading)

@Ezetowers
Copy link
Author

@haircommander Here is the full stacktrace
stacktrace.log

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2022
@github-actions
Copy link

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 15, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2023
@haircommander haircommander reopened this Feb 15, 2023
@haircommander haircommander added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 15, 2023
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2023
@github-actions
Copy link

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 17, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2023
@haircommander haircommander reopened this Jun 20, 2023
@haircommander haircommander removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 20, 2023
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2023
@github-actions
Copy link

Closing this issue since it had no activity in the past 90 days.

@github-actions github-actions bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 20, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants