How to configure GPU memory limit for each model in one serving #2097

motocoder-cn · 2023-02-15T09:49:54Z

How to configure GPU memory limit for each model in one serving?

i just found configure "one serving instance" gpu memory limit with platform_config_file->per_process_gpu_memory_fraction

how can i configure each model gpu memory in one serving instance? i do not see any config in model_config like:

model_config_list {
config {
name: 'model-1'
base_path: '/models/model-1'
model_platform: 'tensorflow'
}
config {
name: 'model-2'
base_path: '/models/model-2'
model_platform: 'tensorflow'
}
}

singhniraj08 · 2023-02-17T05:58:06Z

@ZhengJuCn,

Currently we have --per_process_gpu_memory_fraction to limit the memory usage of model server and there is no such method available to limit GPU usage on model level. Ref: /pull/694

Please try using --per_process_gpu_memory_fraction as shown below and let us know if it works for you. If not, please help us the use case of having GPU limit on model level. Thank you!

Example code to run model server image with memory limit enabled:

docker run --runtime=nvidia -p 8501:8501 \ --mount type=bind,\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\ target=/models/half_plus_two \ -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu --per_process_gpu_memory_fraction=0.5

motocoder-cn · 2023-02-17T11:15:53Z

@singhniraj08 no. per_process_gpu_memory_fraction is not work for me. in k8s alloc one gpu for one container which contain multi models. per_process_gpu_memory_fraction is container level(serving level)

singhniraj08 · 2023-02-28T04:34:54Z

@yimingz-a,

Please take a look into this feature request to configure GPU memory limit for each model in tensorflow model server. Thank you!

singhniraj08 self-assigned this Feb 16, 2023

singhniraj08 added the type:support label Feb 16, 2023

singhniraj08 added the stat:awaiting response label Feb 17, 2023

singhniraj08 added type:feature and removed type:support labels Feb 20, 2023

singhniraj08 assigned yimingz-a and unassigned singhniraj08 Feb 28, 2023

singhniraj08 added stat:awaiting tensorflower and removed stat:awaiting response labels Feb 28, 2023

singhniraj08 mentioned this issue Apr 13, 2023

capping resources assigned to each model in multi model serving #2132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to configure GPU memory limit for each model in one serving #2097

How to configure GPU memory limit for each model in one serving #2097

motocoder-cn commented Feb 15, 2023

singhniraj08 commented Feb 17, 2023

motocoder-cn commented Feb 17, 2023

singhniraj08 commented Feb 28, 2023

How to configure GPU memory limit for each model in one serving #2097

How to configure GPU memory limit for each model in one serving #2097

Comments

motocoder-cn commented Feb 15, 2023

singhniraj08 commented Feb 17, 2023

motocoder-cn commented Feb 17, 2023

singhniraj08 commented Feb 28, 2023