Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding alternative method for monitoring configs / issues with mutating pods with file #2067

Open
alexagriffith opened this issue Oct 7, 2022 · 0 comments

Comments

@alexagriffith
Copy link

alexagriffith commented Oct 7, 2022

Feature Request

If this is a feature request, please fill out the following form in full:

Describe the problem the feature is intended to solve

In Kserve, we apply configs in a CRD called cluster serving runtime. We want to add prometheus configs that will be applied to each of the pods that are using the tensorflow serving runtime. As detailed in kserve/kserve#2462, all of our other runtimes have default prometheus ports exposed, and we would like to configure this for tensorflow as well.

Since tf serving requires a file to be passed in as the config, we need to somehow store that file in the ClusterServingRuntime CRD namespace and then apply it to all services using tf serving (in their own namespace). The problem with that is we can't share volumes/configmaps in different namespaces. We would like to be able to apply a default prometheus config in the ClusterServingRuntime CRD that is then applied to the pods using tf serving. We have some prometheus metric functionality that we would like to be easily configured to expose. We could have users submit a file themselves, but it would be nice to have an option for this to work by default by setting an annotation/param/variable without requiring each user to have a file passed in.

Describe the solution

In addition to the monitoring.config file that is required here, if we can also pass in something like an argument or env variable for these two parameters setting if prometheus is enabled and the path, this would allow us to enable scraping prometheus metrics much easier since we use a CRD in the kserve namespace to mutate the pod YAML in its own namespace.

Describe alternatives you've considered

I tried getting hacky to pass in the file, using a config map with mounted volume (didn't work because of namespace differences) and we considered creating something in the controller just to have this work with tensorflow serving to allow prometheus metrics. If there is an easier way and I missed something please let me know! I tried to be thorough with my research in the docs and examples/blogs.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants