Why onnxruntime.capi.onnxruntime_inference_collection.InferenceSession.run() takes a lot of time for the first time on a new machine? #4098

AnkushRR · 2022-03-29T10:28:27Z

AnkushRR
Mar 29, 2022

Hi, we have a gpu (g4dn.xl) ec2 instance set-up to run our face recognition ai model. Running inference session usually takes 1-2 seconds for an image but whenever we launch a new instance with an ami of existing instance, for the first time when inference session runs, it takes around 150 seconds. but from the second time onwards it again takes 1-2 seconds. We want to understand why it happens only for the first time. We are working on autoscaling for our gpu instances and this thing affects start-up time for new instances. please help us understand this.

Answered by jcwchen

Mar 29, 2022

Hi @AnkushRR,
It seems like a question for ONNXRuntime. My best guess is in the first run, ONNXRuntime does some optimization for your initial model, which takes additional time. And then the optimized model will replace your original model in place. For later inference, ONNXRuntime won't do those optimization again with the optimized model and therefore inference it with higher speed.

To verify my guess, you can try to get the optimized model by optimized_model_filepath, then directly inference it and see whether it can inference faster in the first run. If the first run still takes much time, please raise this issue in ONNXRuntime to get the best help from the Runtime experts.

View full answer

jcwchen · 2022-03-29T18:32:25Z

jcwchen
Mar 29, 2022
Maintainer

Hi @AnkushRR,
It seems like a question for ONNXRuntime. My best guess is in the first run, ONNXRuntime does some optimization for your initial model, which takes additional time. And then the optimized model will replace your original model in place. For later inference, ONNXRuntime won't do those optimization again with the optimized model and therefore inference it with higher speed.

To verify my guess, you can try to get the optimized model by optimized_model_filepath, then directly inference it and see whether it can inference faster in the first run. If the first run still takes much time, please raise this issue in ONNXRuntime to get the best help from the Runtime experts.

1 reply

AnkushRR Mar 31, 2022
Author

Hi @jcwchen, I Tried this. It did reduce the session.run() time by 10 seconds. It now takes 140 seconds but still it is much higher compared to it's consecutive session.run() time. As you suggested I am raising this issue in ONNXRuntime. Thanks for the help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why onnxruntime.capi.onnxruntime_inference_collection.InferenceSession.run() takes a lot of time for the first time on a new machine? #4098

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Why onnxruntime.capi.onnxruntime_inference_collection.InferenceSession.run() takes a lot of time for the first time on a new machine? #4098

AnkushRR Mar 29, 2022

Replies: 1 comment · 1 reply

jcwchen Mar 29, 2022 Maintainer

AnkushRR Mar 31, 2022 Author

AnkushRR
Mar 29, 2022

Replies: 1 comment 1 reply

jcwchen
Mar 29, 2022
Maintainer

AnkushRR Mar 31, 2022
Author