Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capping resources assigned to each model in multi model serving #2132

Open
saeid93 opened this issue Apr 13, 2023 · 3 comments
Open

capping resources assigned to each model in multi model serving #2132

saeid93 opened this issue Apr 13, 2023 · 3 comments

Comments

@saeid93
Copy link

saeid93 commented Apr 13, 2023

Is there a way to cap the number (e.g. CPU cores, CUDA MPS threads) of resources assigned to each model in a multi-model tensorflow server?
The only way (straightforward way and not considering lower-level tools like cpu limits), I can think of resource allocation to the microservices (like model servers) is containerization or VMs, so I think there isn’t such an option. Is that true?

@singhniraj08
Copy link

@saeid93,

Similar feature request #2097 is already in work. Requesting you to follow and +1 that thread for updates.

Yes there are no option currently to configure resources per model for multi-model serving setup. But you can try the flag rest_api_num_threads, as mentioned here if that helps.

Thank you!

@saeid93
Copy link
Author

saeid93 commented Apr 13, 2023

Thank you @singhniraj08,
Is there a same plan on the roadmap for cpu inferencing?

@singhniraj08
Copy link

@saeid93,

Configuring to limit CPU usage/core per model in mutli-model setup is currently not in our roadmap right now. But this sounds like a good feature to implement. I will keep this as a feature request and discuss internally within team for implementation. Once we have an update, we will update this thread.
Thank you for bringing this up to our attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants