Why TF Serving using one CUDA Compute Stream #2221
Labels
stale
This label marks the issue/pr stale - to be closed automatically if no activity
stat:awaiting response
type:support
Trying to understand why TF uses one CUDA compute stream? Is there a metric which shows if ops are waiting to be scheduled on that one compute stream? I want to understand if the ops are waiting in high QPS scenarios
The text was updated successfully, but these errors were encountered: