forked from pytorch/pytorch
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303
Open
DenisVieriu97
wants to merge
30
commits into
master
Choose a base branch
from
dev/denis/conv_nchw_optimization
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DenisVieriu97
requested review from
kulinseth,
skotapati,
razarmehr,
Ronian526,
ssaladis and
shuhand0
February 9, 2023 04:48
@DenisVieriu97 , needs extensive testing... I think we should merge this post PyT 2.0.. |
DenisVieriu97
force-pushed
the
dev/denis/conv_nchw_optimization
branch
from
February 10, 2023 06:44
086e3a5
to
4e74a86
Compare
Fix the TEST_WITH_MPS macro.
* Test MPS CI runners * Cherry pick remaining files * Enable lintrunner: * Change lint runner * Retrigger checks * Retrigger checks #2 * Retrigger checks #3 * Retrigger checks #4 * Retrigger checks #5 * Retrigger checks #5 * Retrigger checks #7 * Retrigger checks #8 * Retrigger checks #9 * Retrigger checks #9 (change arch to arm) * Retrigger checks #10 * Retrigger checks #11 * Retrigger checks #12 * Retrigger checks #13 * Retrigger checks #14 * Retrigger checks #14 * Retrigger checks #15 * Retrigger checks #16 * Retrigger checks #16 * Retrigger checks #17 * Retrigger checks #19 * Retrigger checks #20 * Retrigger checks #21 * Fix lintrunner * Fix lintrunner * Remove lint.json
* Use DISTRIBUTED=1 for MPS CI runners * Disable openmp
* Remove unnecessary CI files * Additional files * Update lint
* Fix bilinear backward pass * Remove comment
* Update macOS 12 blocklist - move sum, masked.var, mul to low precision list - unblock them from running * - mark __rdiv__ failures as accumulate error exceeds atol/rtol
- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated. - Fixed bias tensor mistakenly getting overwritten to zeros - Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag. Co-authored-by: Kulin Seth <kulin_seth@apple.com>
- add _mps_convolution_impl that takes optional shape - for conv_tranpose2d grad, use the shape from input directly - remove nn.functional.conv_transpose2d grad from blocklist Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>
Fixes a crash where the inputTensor could go null and cause a crash.
- casting the input tensor to float32 and cast back the output tensor - unblock the test
DenisVieriu97
force-pushed
the
dev/denis/conv_nchw_optimization
branch
from
February 15, 2023 21:17
4e74a86
to
b0c9dbb
Compare
- unblock rdiv float16
* Fix upsample for NHWC output * Add testcase
- give warnings of converting int64 for reduction ops - use cast tensor for reduction sum on trace - unblock trace from running
* - move nn.functional.feature_alpha_dropoutwith_train, normalnumber_mean, new_empty_strided to expected failures * - update new_empty_strided --------- Co-authored-by: Kulin Seth <kulin_seth@apple.com>
…ntiguous calls (#341) * Fix convolution crash; remove unnecessary contiguous calls * Fix lintrunner
This should fix the failure with GPT2 when use_cache=True
DenisVieriu97
force-pushed
the
dev/denis/conv_nchw_optimization
branch
from
February 18, 2023 02:34
b0c9dbb
to
6caefbc
Compare
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
* Handle broadcasting by expanding src tensor in Copy.mm * Unblock linalg_matrix_power * Improved formatting
…he actual mem layout is NCHW
DenisVieriu97
force-pushed
the
dev/denis/conv_nchw_optimization
branch
from
February 21, 2023 23:28
4d2ba19
to
9a5b002
Compare
kulinseth
force-pushed
the
master
branch
2 times, most recently
from
February 28, 2023 06:45
298b2d6
to
606362d
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since NHWC is represented as a view operation in PyTorch, we can execute the convolution ops directly in NCHW if the suggested memory format is NHWC but the actual memory layout is still NCHW.
This skips over unnecessary steps, such as gather to materialize the memory from NCHW -> NHWC, then transposing back the output of the convolution to NCHW in the graph.