PR #61632: Allow merging compute-copy streams #66555
Merged
+168
−46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #61632: Allow merging compute-copy streams
Imported from GitHub PR #61632
This PR works as a part of the whole Multi-Stream feature in TF, which is proposed in #61185.
Allow merging the host_to_device/device_to_host/device_to_device data copy streams into the compute stream in one stream group. This is useful to reduce the overhead caused by GPU stream synchronization, especially when data transfers are frequent. Another benefit is, for host_to_device copy, merging streams allows early scheduling of subsequent ops, doesn't have to wait until the data copy is really finished.
As a part of the multi-stream feature, it can help multi-stream reach a much higher throughput. Taking our proto models as an example, the original model inference throughput is 1524 samples/second, and 2229 samples/ second with multi-stream, and 2471 samples/second further with stream-merging.
However, stream-merging can also be used separately. We got inference throughput gain from 1028 samples/second to 1187 samples/second by enabling stream-merging.
Please refer to the 'Performance' part in our document for detailed and more experiment results.
Copybara import of the project:
--
9e51f38 by Robin Zhang robinz@nvidia.com:
Allow merging compute-copy streams
--
a45967f by Robin Zhang robinz@nvidia.com:
Improve coding style
--
ccae79b by Robin Zhang robinz@nvidia.com:
Rename stream_merge_options_
--
332e1fe by Robin Zhang robinz@nvidia.com:
Put stream checking out of callback
--
4a0c789 by Robin Zhang robinz@nvidia.com:
Move StreamMergeOptions to Experimental
--
efe56d7 by Robin Zhang robinz@nvidia.com:
add some comments
Merging this change closes #61632
Reverts changelist 525613555
FUTURE_COPYBARA_INTEGRATE_REVIEW=#61632 from buptzyb:multistream-streammerge 5aabb58