Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to configure GPU memory limit for each model in one serving #2097

Open
motocoder-cn opened this issue Feb 15, 2023 · 3 comments
Open

How to configure GPU memory limit for each model in one serving #2097

motocoder-cn opened this issue Feb 15, 2023 · 3 comments

Comments

@motocoder-cn
Copy link

How to configure GPU memory limit for each model in one serving?

i just found configure "one serving instance" gpu memory limit with platform_config_file->per_process_gpu_memory_fraction

how can i configure each model gpu memory in one serving instance? i do not see any config in model_config like:

model_config_list {
config {
name: 'model-1'
base_path: '/models/model-1'
model_platform: 'tensorflow'
}
config {
name: 'model-2'
base_path: '/models/model-2'
model_platform: 'tensorflow'
}
}

@singhniraj08
Copy link

@ZhengJuCn,

Currently we have --per_process_gpu_memory_fraction to limit the memory usage of model server and there is no such method available to limit GPU usage on model level. Ref: /pull/694

Please try using --per_process_gpu_memory_fraction as shown below and let us know if it works for you. If not, please help us the use case of having GPU limit on model level. Thank you!

Example code to run model server image with memory limit enabled:

docker run --runtime=nvidia -p 8501:8501 \ --mount type=bind,\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\ target=/models/half_plus_two \ -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu --per_process_gpu_memory_fraction=0.5

@motocoder-cn
Copy link
Author

@singhniraj08 no. per_process_gpu_memory_fraction is not work for me. in k8s alloc one gpu for one container which contain multi models. per_process_gpu_memory_fraction is container level(serving level)

@singhniraj08
Copy link

@yimingz-a,

Please take a look into this feature request to configure GPU memory limit for each model in tensorflow model server. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants