New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inotify inode leak in file discovery #13929
Comments
Watching individual files is not recommended by the used library: fsnotify as it’s not resilient. Maybe the optimal intermediate solution would be to watch the directory itself (excluding the files) in addition to the necessary files. However, it appears that fsnotify generalizes inotify’s behavior, which involves watching all of a directory’s files as soon as the directory is watched, to other implementations. I believe the best solution for you would be to isolate each job’s files in a separate folder. Note that recent kernel versions seem to adjust max_user_watches based on the available RAM. |
Maybe we should mention this in the docs somewhere. |
We have deployed Prometheus in k8s and are using ConfigMaps to mount its configuration files. If we store these files in different directories, this would necessitate creating a separate ConfigMap for each job and mounting them into the Pod, which would result in Pod restarts. This approach is clearly not elegant. |
What did you do?
We use file discovery for target discovery, and save the files in the same directory.
What did you expect to see?
each job only needs to watch to the files they care about, do not watch all the files in the directory.
What did you see instead? Under which circumstances?
As more and more jobs are created, the inotify inode may run out and encounter an error of "too many open files".
We save the files used for target discovery from multiple jobs in the same directory. By reviewing the source code (kqueue. go), we found that when discovering targets via a specified file, it will watch all files in the same directory (in fsnotify, watching to a directory will actually traverse all files). This results in each job watching to all the files in the directory, but in reality, each job only needs to watch to the files they care about.
System information
Linux 3.10.0-1160.90.1.el7.x86_64 x86_64
Prometheus version
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: