Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in v1.7.0 #616

Open
aakashbajaj opened this issue Apr 10, 2024 · 8 comments
Open

panic in v1.7.0 #616

aakashbajaj opened this issue Apr 10, 2024 · 8 comments

Comments

@aakashbajaj
Copy link

aakashbajaj commented Apr 10, 2024

Describe the bug

Panic occured randomly in one of our microservices.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xe01628]

goroutine 327 [running]:
github.com/fsnotify/fsnotify.(*watches).remove(0xc0007bdec0, 0x90de68?)
        /go/pkg/mod/github.com/fsnotify/fsnotify@v1.7.0/backend_inotify.go:182 +0x88
github.com/fsnotify/fsnotify.(*Watcher).readEvents(0xc00012cd00)
        /go/pkg/mod/github.com/fsnotify/fsnotify@v1.7.0/backend_inotify.go:535 +0x4a5
created by github.com/fsnotify/fsnotify.NewBufferedWatcher
        /go/pkg/mod/github.com/fsnotify/fsnotify@v1.7.0/backend_inotify.go:270 +0x1f0

Code to Reproduce

Not able to reproduce

File operations to reproduce

NA

Which operating system and version are you using?

Linux 6.1.58+ SMP PREEMPT_DYNAMIC Mon Jan 29 15:19:25 UTC 2024 x86_64 Linux

Which fsnotify version are you using?

v1.7.0

Did you try the latest main branch?

No

@arp242
Copy link
Member

arp242 commented Apr 10, 2024

I can't do anything with this; you'll need to either give more details on how to reproduce or track down the problem yourself.

@arp242 arp242 added the need-feedback Requires feedback to be actionable label Apr 10, 2024
@aakashbajaj
Copy link
Author

aakashbajaj commented Apr 10, 2024

I apologise but we are not able to repro this at all. And there's no specific application state that caused this to occur.
fsnotify is an indirect dependency in the service. Meanwhile we will try to pin point the issue ourselves and update here if we find anything.

@arp242
Copy link
Member

arp242 commented Apr 10, 2024

I mean, there's no details here at all; not even how or where you're using it, or what kind of events are happening, or ... anything.

The panic is triggered on IN_DELETE_SELF: the watched path was deleted. This is handled (and tested), but it seems some specific sequence of events cause this error for some reason, but I'm not going to spend ages trying random stuff to find out what.

Any clue on usage patterns would be useful. Or if you want to know exactly, use replace github.com/fsnotify/fsnotify => ./fsnotify in go.mod and add a few debug printfs – maybe adding a FSNOTIFY_DEBUG flag would be nice for this.

@ChristopherAparicio
Copy link

ChristopherAparicio commented Apr 15, 2024

I have the same panic happening time to time on my micro-services.

The library is used indirectly by viper

go mod why -m github.com/fsnotify/fsnotify
# github.com/fsnotify/fsnotify
github.com/spf13/viper
github.com/fsnotify/fsnotify

The file using fsnotify :

https://github.com/spf13/viper/blob/master/viper.go

The problem is strange because it appears on a daemon that doesn't use viper at all, so I'm guessing something that would happen in an init. I'll give you more details if I find anything.

@arp242
Copy link
Member

arp242 commented Apr 25, 2024

I added FSNOTIFY_DEBUG in #619.

@giorio94
Copy link

giorio94 commented May 3, 2024

I just stumbled upon the same issue, and it reproduces consistently running one of the Cilium's unit tests a few times:

$ git init 
$ git remote add origin https://github.com/cilium/cilium.git
$ git fetch --depth 1 origin 715ed4c20572b354cc875a81463d636624da31d4
$ git checkout 715ed4c20572b354cc875a81463d636624da31d4

$ go test -c ./pkg/clustermesh/common
$ ./common.test --test.v --test.run Test/ClusterMeshTestSuite/TestWatchConfigDirectory --test.count=100 --test.failfast 

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x87b8f9]

goroutine 550 [running]:
github.com/fsnotify/fsnotify.(*watches).remove(0xc001089e00, 0x5)
        /tmp/tmp.Uqhjz3pRld/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:182 +0x79
github.com/fsnotify/fsnotify.(*Watcher).readEvents(0xc000a32280)
        /tmp/tmp.Uqhjz3pRld/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:535 +0x4d2
created by github.com/fsnotify/fsnotify.NewBufferedWatcher in goroutine 549
        /tmp/tmp.Uqhjz3pRld/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:270 +0x1c5

Output updating fsnotify to 77d5fee and running with FSNOTIFY_DEBUG=1

FSNOTIFY_DEBUG: 18:59:47.199009999  AddWith("/tmp/multicluster734350355")
FSNOTIFY_DEBUG: 18:59:47.199131429  AddWith("/tmp/multicluster734350355/cluster1")
FSNOTIFY_DEBUG: 18:59:47.199292685  AddWith("/tmp/multicluster734350355/cluster2")
FSNOTIFY_DEBUG: 18:59:47.199467407           4:IN_ATTRIB                      → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.199496820        1024:IN_DELETE_SELF                 → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.199567928  Remove("/tmp/multicluster734350355/cluster1")
FSNOTIFY_DEBUG: 18:59:47.199670613       32768:IN_IGNORED                     → ""
FSNOTIFY_DEBUG: 18:59:47.199687176         512:IN_DELETE                      → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.210065264         256:IN_CREATE                      → "/tmp/multicluster734350355/cluster3"
FSNOTIFY_DEBUG: 18:59:47.210237310  AddWith("/tmp/multicluster734350355/cluster3")
FSNOTIFY_DEBUG: 18:59:47.210256957           2:IN_MODIFY                      → "/tmp/multicluster734350355/cluster3"
FSNOTIFY_DEBUG: 18:59:47.220508585          64:IN_MOVED_FROM                  → "/tmp/multicluster734350355/cluster3"
FSNOTIFY_DEBUG: 18:59:47.220541191         128:IN_CLASSA_MAX | IN_MOVED_TO    → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.220582465  Remove("/tmp/multicluster734350355/cluster3")
FSNOTIFY_DEBUG: 18:59:47.220679483  AddWith("/tmp/multicluster734350355/cluster1")
FSNOTIFY_DEBUG: 18:59:47.220705535        2048:IN_MOVE_SELF                   → ""
FSNOTIFY_DEBUG: 18:59:47.220753704       32768:IN_IGNORED                     → ""
FSNOTIFY_DEBUG: 18:59:47.230774923           4:IN_ATTRIB                      → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.230794723           4:IN_ATTRIB                      → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.331501395         256:IN_CREATE                      → "/tmp/multicluster734350355/..data_tmp"
FSNOTIFY_DEBUG: 18:59:47.331588170          64:IN_MOVED_FROM                  → "/tmp/multicluster734350355/..data_tmp"
FSNOTIFY_DEBUG: 18:59:47.331731836         128:IN_CLASSA_MAX | IN_MOVED_TO    → "/tmp/multicluster734350355/..data"
FSNOTIFY_DEBUG: 18:59:47.331785980           4:IN_ATTRIB                      → "/tmp/multicluster734350355/cluster2"
FSNOTIFY_DEBUG: 18:59:47.331896182        1024:IN_DELETE_SELF                 → "/tmp/multicluster734350355/cluster2"
FSNOTIFY_DEBUG: 18:59:47.332115852  AddWith("/tmp/multicluster734350355/cluster2")
FSNOTIFY_DEBUG: 18:59:47.332231977       32768:IN_IGNORED                     → ""
FSNOTIFY_DEBUG: 18:59:47.332257640  1073742336:IN_DELETE | IN_ISDIR           → "/tmp/multicluster734350355/..data-1"
FSNOTIFY_DEBUG: 18:59:47.332482645         256:IN_CREATE                      → "/tmp/multicluster734350355/..data_tmp"
FSNOTIFY_DEBUG: 18:59:47.332537092          64:IN_MOVED_FROM                  → "/tmp/multicluster734350355/..data_tmp"
FSNOTIFY_DEBUG: 18:59:47.332599916         128:IN_CLASSA_MAX | IN_MOVED_TO    → "/tmp/multicluster734350355/..data"
FSNOTIFY_DEBUG: 18:59:47.332655741           4:IN_ATTRIB                      → "/tmp/multicluster734350355/cluster2"
FSNOTIFY_DEBUG: 18:59:47.332722224        1024:IN_DELETE_SELF                 → "/tmp/multicluster734350355/cluster2"
FSNOTIFY_DEBUG: 18:59:47.332757686  AddWith("/tmp/multicluster734350355/cluster2")
FSNOTIFY_DEBUG: 18:59:47.332860668       32768:IN_IGNORED                     → ""
FSNOTIFY_DEBUG: 18:59:47.332883458  1073742336:IN_DELETE | IN_ISDIR           → "/tmp/multicluster734350355/..data-2"
FSNOTIFY_DEBUG: 18:59:47.332905160           4:IN_ATTRIB                      → "/tmp/multicluster734350355/cluster1"
FSNOTIFY_DEBUG: 18:59:47.332965314  Remove("/tmp/multicluster734350355/cluster1")
FSNOTIFY_DEBUG: 18:59:47.332989726        1024:IN_DELETE_SELF                 → "/tmp/multicluster734350355/cluster1"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x87bc39]

goroutine 167 [running]:
github.com/fsnotify/fsnotify.(*watches).remove(0xc000c8a480, 0x5)
        /tmp/tmp.Uqhjz3pRld/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:178 +0x79
github.com/fsnotify/fsnotify.(*Watcher).readEvents(0xc000aca000)
        /tmp/tmp.Uqhjz3pRld/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:554 +0x5f8
created by github.com/fsnotify/fsnotify.NewBufferedWatcher in goroutine 166
        /tmp/tmp.Uqhjz3pRld/vendor/github.com/fsnotify/fsnotify/backend_inotify.go:266 +0x1c5

Feel free to let me know if you need more information for troubleshooting it.

@arp242 arp242 added linux and removed need-feedback Requires feedback to be actionable labels May 3, 2024
@arp242
Copy link
Member

arp242 commented May 3, 2024

Thanks; I can reproduce it with that fairly reliably. Reducing that to a test case seems a bit non-trivial, as it's pretty complex. I think what's going on that it that it manually calls Watcher.Remove() for paths that are deleted, and events arrive for that path after it's removed from the internal book-keeping. Or something along those lines – my test cases for this always work, but maybe I need to add more goroutines or something for more racy-ness.

@arp242
Copy link
Member

arp242 commented May 3, 2024

Here's a basic test case:

func TestRemoveRace(t *testing.T) {
	tmp := t.TempDir()

	w := newCollector(t, tmp)
	w.collect(t)

	for i := 0; i < 100; i++ {
		go mkdirAll(t, tmp, "/dir1")
		go addWatch(t, w.w, tmp, "/dir1")
		go rmAll(t, tmp, "/dir1")
		go rmWatch(t, w.w, tmp, "/dir1")
	}

	time.Sleep(1 * time.Second)

	ev := w.stop(t)
	fmt.Println(ev)
}
[~cg/fsnotify](main)% go test -run TestRemoveRace
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x5438f8]

goroutine 19 [running]:
github.com/fsnotify/fsnotify.(*inotify).readEvents(0xc0000e8140)
        /home/martin/code/Golib/fsnotify/backend_inotify.go:524 +0x838
created by github.com/fsnotify/fsnotify.newBufferedBackend in goroutine 18
        /home/martin/code/Golib/fsnotify/backend_inotify.go:191 +0x196
exit status 2
FAIL    github.com/fsnotify/fsnotify    0.007s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants