You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To activate the streaming capability in bentoML, you require a Runnable function that yields an AsyncGenerator. Consequently, invoking this function returns promptly, regardless of ongoing computations that produce outputs. Consequently, the Runnable function is always deemed complete, initiating immediate processing for all service requests, irrespective of any ongoing computations from a prior generator. Consequently, there's no limit on the memory footprint of the runner.
To reproduce
No response
Expected behavior
The service should wait for the first AsyncGenerator to complete before requesting a new one.
A simple fix to this issue is to add a lock at the start of the runnable method:
Describe the bug
To activate the streaming capability in bentoML, you require a Runnable function that yields an AsyncGenerator. Consequently, invoking this function returns promptly, regardless of ongoing computations that produce outputs. Consequently, the Runnable function is always deemed complete, initiating immediate processing for all service requests, irrespective of any ongoing computations from a prior generator. Consequently, there's no limit on the memory footprint of the runner.
To reproduce
No response
Expected behavior
The service should wait for the first AsyncGenerator to complete before requesting a new one.
A simple fix to this issue is to add a lock at the start of the runnable method:
I think this locking mechanism should either be implement on the side of bentoML or its necessity should be made clear in the documentation
Environment
bentoml==1.1.4
The text was updated successfully, but these errors were encountered: