Adding alternative method for monitoring configs / issues with mutating pods with file #2067

alexagriffith · 2022-10-07T20:55:47Z

Feature Request

If this is a feature request, please fill out the following form in full:

Describe the problem the feature is intended to solve

In Kserve, we apply configs in a CRD called cluster serving runtime. We want to add prometheus configs that will be applied to each of the pods that are using the tensorflow serving runtime. As detailed in kserve/kserve#2462, all of our other runtimes have default prometheus ports exposed, and we would like to configure this for tensorflow as well.

Since tf serving requires a file to be passed in as the config, we need to somehow store that file in the ClusterServingRuntime CRD namespace and then apply it to all services using tf serving (in their own namespace). The problem with that is we can't share volumes/configmaps in different namespaces. We would like to be able to apply a default prometheus config in the ClusterServingRuntime CRD that is then applied to the pods using tf serving. We have some prometheus metric functionality that we would like to be easily configured to expose. We could have users submit a file themselves, but it would be nice to have an option for this to work by default by setting an annotation/param/variable without requiring each user to have a file passed in.

Describe the solution

In addition to the monitoring.config file that is required here, if we can also pass in something like an argument or env variable for these two parameters setting if prometheus is enabled and the path, this would allow us to enable scraping prometheus metrics much easier since we use a CRD in the kserve namespace to mutate the pod YAML in its own namespace.

Describe alternatives you've considered

I tried getting hacky to pass in the file, using a config map with mounted volume (didn't work because of namespace differences) and we considered creating something in the controller just to have this work with tensorflow serving to allow prometheus metrics. If there is an easier way and I missed something please let me know! I tried to be thorough with my research in the docs and examples/blogs.

Thank you!

alexagriffith mentioned this issue Oct 7, 2022

merge model server metrics into queue-proxy kserve/kserve#2465

Closed

singhniraj08 self-assigned this Oct 10, 2022

singhniraj08 added the type:feature label Oct 10, 2022

singhniraj08 assigned shan3290 and unassigned singhniraj08 Oct 11, 2022

singhniraj08 added the stat:awaiting tensorflower label Oct 11, 2022

singhniraj08 assigned nniuzft and unassigned shan3290 Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding alternative method for monitoring configs / issues with mutating pods with file #2067

Adding alternative method for monitoring configs / issues with mutating pods with file #2067

alexagriffith commented Oct 7, 2022 •

edited

Adding alternative method for monitoring configs / issues with mutating pods with file #2067

Adding alternative method for monitoring configs / issues with mutating pods with file #2067

Comments

alexagriffith commented Oct 7, 2022 • edited

Feature Request

Describe the problem the feature is intended to solve

Describe the solution

Describe alternatives you've considered

alexagriffith commented Oct 7, 2022 •

edited