Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to pick up create event under heavy load #148

Closed
sethgrid opened this issue Jun 3, 2016 · 6 comments
Closed

Failed to pick up create event under heavy load #148

sethgrid opened this issue Jun 3, 2016 · 6 comments
Labels
linux need-feedback Requires feedback to be actionable

Comments

@sethgrid
Copy link

sethgrid commented Jun 3, 2016

I'm using fsnotify with go 1.6 on CentOS release 6.7 (Final). The app uses fsnotify to watch a directory and when files are created it does some stuff and deletes the file. Thousands of files can be dropped in the directory very quickly. Normally, the directory will only have up to a few hundred items in it at a time, but we have seen it go over 30k. Normally, everything is fine and the script clears out all the directory contents.

We are seeing in some spiky situations (10k+ files put into the watched directory) and fsnotify is seemingly failing to pick some up newly created files. In the 10k+ case, it failed to pick up around 600 files. Any thoughts, ideas, or suggestions?

@nathany
Copy link
Contributor

nathany commented Jun 28, 2016

I've personally not test with 10k+ files. We should try to establish if the underlying inotify code handles that many files properly (for a given kernel). Assuming it does, then we will at least know it's an issue in fsnotify.

@chmike
Copy link

chmike commented Sep 28, 2016

This is normal because there is an upper limit for queued iNotify events. See man inotify for more information on this.

From the manual :

   /proc interfaces
       The following interfaces can be used to limit the amount of kernel mem‐
       ory consumed by inotify:

       /proc/sys/fs/inotify/max_queued_events
              The  value  in  this file is used when an application calls ino‐
              tify_init(2) to set an upper limit on the number of events  that
              can  be queued to the corresponding inotify instance.  Events in
              excess of this limit are dropped, but an IN_Q_OVERFLOW event  is
              always generated.

       /proc/sys/fs/inotify/max_user_instances
              This specifies an upper limit on the number of inotify instances
              that can be created per real user ID.

       /proc/sys/fs/inotify/max_user_watches
              This specifies an upper limit on the number of watches that  can
              be created per real user ID.

and also

       Note that the event queue can overflow.  In this case, events are lost.
       Robust applications should handle the possibility of lost events grace‐
       fully.   For example, it may be necessary to rebuild part or all of the
       application cache.  (One simple, but possibly expensive, approach is to
       close  the  inotify file descriptor, empty the cache, create a new ino‐
       tify file descriptor, and then re-create watches and cache entries  for
       the objects to be monitored.)

So, considering your particular use case, I suggest you increase the value of /proc/sys/fs/inotify/max_queued_events to avoid hitting the queue limit. It is a "limitation" of the fsnotify library to not inform users of dropped events due to queue overflow.

@mauricioabreu
Copy link

I am having a similar problem but with a lower number of files.
I updated /proc/sys/fs/inotify/max_queued_events and /proc/sys/fs/inotify/max_user_watches to 80.000 and the problem still persists.

I read the inotify wrapper code of fsnotify and I read the method Add always execute the inotifyAddWatch function.

My code checks for directory creations and use the Add a watcher on it. Should I keep a map of the already watched directories?

@mauricioabreu
Copy link

Still facing this same problem.

@sethgrid did you get it working?

I increased the number to 10 times more (1 million) and it still does not work.

@arp242
Copy link
Member

arp242 commented Jul 30, 2022

I set up a watcher on /tmp/xxx with the ./cmd/fsnotify tool:

% go run ./cmd/fsnotify /tmp/xxx

And then I ran the following code to create, write to, and remove 100,000 files:

func main() {
	var wg sync.WaitGroup
	for i := 0; i <= 100_000; i++ {
		wg.Add(1)
		go func(i int) {
			defer wg.Done()

			path := fmt.Sprintf("/tmp/xxx/file-%d", i)

			err := os.WriteFile(path, []byte("hello"), 0o644)
			if err != nil {
				fmt.Println(err)
			}
			err = os.Remove(path)
			if err != nil {
				fmt.Println(err)
			}
		}(i)
	}

	wg.Wait()
}

I ran it a few times, and looking at the output everything seems to work, and the number of reported events match the number of expected events.

This is on my Linux system with the default settings:

[~]% sysctl fs.inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 124946

Maybe I'm doing something different? To move ahead on this, we need a way to reproduce the problem.

I also ran it in ~/xxx, just to make sure that it works on a "real" file system too (/tmp is a memory FS on my system), but I had to use slightly different code as it's slower and the 100k goroutines runs out of max open files:

func do() {
	var wg sync.WaitGroup
	for i := 0; i <= 1_000; i++ {
		wg.Add(1)
		go func(i int) {
			defer wg.Done()

			path := fmt.Sprintf("/home/martin/xxx/file-%d", i)

			err := os.WriteFile(path, []byte("hello"), 0o644)
			if err != nil {
				fmt.Println(err)
			}
			err = os.Remove(path)
			if err != nil {
				fmt.Println(err)
			}
		}(i)
	}
	wg.Wait()
}

func main() {
	for i := 0; i <= 100; i++ {
		do()
	}
}

@arp242 arp242 added need-feedback Requires feedback to be actionable and removed documentation labels Jul 30, 2022
@arp242 arp242 changed the title failed to pick up create event? Failed to pick up create event under heavy load Jul 30, 2022
@arp242
Copy link
Member

arp242 commented Aug 7, 2022

I added a test which creates and removes 1.5 million files; this consistently seems to work well on my laptop and in the CI (where it has run many times now, as I was trying to get the kqueue tests fixed).

So yeah, I don't know. Really need a specific example to reproduce, or someone who can reproduce it on their system where we can investigate what the cause it. I'll close this for the time being as it's not clear this is a fsnotify problem and it's not clear if it still exists in the current version (#434 changed quite a bit of inotify backend internals); can always reopen later.

@arp242 arp242 closed this as completed Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linux need-feedback Requires feedback to be actionable
Projects
None yet
Development

No branches or pull requests

5 participants