Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: fsnotify.Watcher.Add randomly hang with high volume file/folder #502

Open
StarAurryon opened this issue Sep 5, 2022 · 7 comments

Comments

@StarAurryon
Copy link

Describe the bug
I am using fsnotify to monitor recursively file changes on Windows on more than 100k files and 24k folders.
When reaching such threshold the Add method randomly hang after 200 calls.

To Reproduce
Steps to reproduce the behavior:

  1. Walk the tree and add each file / folder, Add method will freeze after a while.

Expected behavior
Supporting any amount of files / folder, going with 30k files and 5000 folders work as expected.

Which operating system and version are you using?
Windows: 10.0.19044 N/A version 19044

@StarAurryon StarAurryon changed the title Windows: fsnotify.Watcher.Add randomly hand with high volume file /f older Windows: fsnotify.Watcher.Add randomly hang with high volume file /f older Sep 5, 2022
@StarAurryon StarAurryon changed the title Windows: fsnotify.Watcher.Add randomly hang with high volume file /f older Windows: fsnotify.Watcher.Add randomly hang with high volume file/folder Sep 5, 2022
@arp242
Copy link
Member

arp242 commented Sep 5, 2022

Which version of fsnotify did you use? Did you try to use the latest main branch?

It would be hugely helpful to have a way to easily reproduce this. That is, a script that sets up a directory tree, that installs a watcher, and then performs file operations. This can be a test case (see fsnotify_test.go), a batch script, a Go program you wrote, a Python script: whatever is easiest for you – I don't really care so much how, but without being able to reproduce this relatively easily it's unlikely I can do much with this any time soon.

@StarAurryon
Copy link
Author

StarAurryon commented Sep 5, 2022

Hello @arp242,

Thanks for the quick feedback. I am running latest stable and tagged version v1.5.4 not the master one.
Before trying to create PR with test cases, I just got a few questions.

Is not emptying (reaching the buffer capacity) of Error or Event channels supposed to block the Add method?
Is the design of fsnotify capable of handling 1M file/folder?

@arp242
Copy link
Member

arp242 commented Sep 5, 2022

I don't really have a good answer to both of those questions to be honest. Just to provide some context, this library was sort-of unmaintained for a while, and I've only been involved for a relatively short period.

Is not emptying (reaching the buffer capacity) of Error or Event channels supposed to block the Add method?

Supposed to? No. But I think that on Windows it could very well do this due to the way it's structured (thinking about this from the top of my head, didn't verify).

Is the design of fsnotify capable of handling 1M file/folder?

It certainly ought to be, but I don't know if it is right now.

I think #339 will probably help your use case a lot.

@StarAurryon
Copy link
Author

Supposed to? No. But I think that on Windows it could very well do this due to the way it's structured (thinking about this from the top of my head, didn't verify).

I will investigate that part first then.

I think #339 will probably help your use case a lot.

That's definitely what I am trying to achieve as I am writing a cross platform cloud sync tool with fyne, boltdb and fsnotify.

@arp242
Copy link
Member

arp242 commented Sep 5, 2022

The best path forward would be to get recursive watching finally finished. That PR works well for Windows; the reason it's not merged (yet) is that recursive watching should be implemented across the board (Linux, BSD, macOS) rather than just Windows. I spent a bit of time on it for inotify in #472, and much of that can be reused for BSD/macOS too. But I haven't had the time to finish it yet, as I wanted to fix issues with existing functionality and expand various test cases etc. first.

Either way, I would strongly recommend giving the latest main branch a try; there have been a number of changes (e.g. #485) that may improve your situation. It should be "stable" and I should probably tag a new release at this point; just want to go over various test cases and documentation to make sure I got it all right – been a bit busy the last few weeks.

@StarAurryon
Copy link
Author

StarAurryon commented Sep 5, 2022

I would strongly recommend giving the latest main branch a try; there have been a number of changes (e.g. #485)

Emptying channels seems to work. Add does not hang anymore. I will take a look at the diffs between v1.5.4 and master.

@coezbek
Copy link

coezbek commented Sep 8, 2022

I was seeing the random deadlocks on func (w *Watcher) Add(name string) error as well.

The issue for me was that I wasn't reading events while I was adding directories to the watcher. But the channel events only has a capacity of 50 so the program would deadlock if 50 filesystem events accumulated while I was adding directories.

What I had to do:

  1. First start a go routine to consume watcher events.
  2. Then start adding directories to the watcher.

The culprit for the deadlock is in the select in func (w *Watcher) sendEvent(name string, mask uint64) bool :

func (w *Watcher) sendEvent(name string, mask uint64) bool {
	if mask == 0 {
		return false
	}
	event := newEvent(name, uint32(mask))
	select {
	case ch := <-w.quit:
		w.quit <- ch
	case w.Events <- event:
	}
	return true
}

This select unfortunately doesn't process input events which are triggered while adding directories:

	w.input <- in
	if err := w.wakeupReader(); err != nil {
		return err
	}
	return <-in.reply // Deadlock is caused here. Watcher wants to sendEvent, but caller waits for it to process the input command

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants