Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

DenisVieriu97
Copy link
Collaborator

@DenisVieriu97 DenisVieriu97 commented Feb 9, 2023

Since NHWC is represented as a view operation in PyTorch, we can execute the convolution ops directly in NCHW if the suggested memory format is NHWC but the actual memory layout is still NCHW.
This skips over unnecessary steps, such as gather to materialize the memory from NCHW -> NHWC, then transposing back the output of the convolution to NCHW in the graph.

@kulinseth
Copy link
Owner

@DenisVieriu97 , needs extensive testing... I think we should merge this post PyT 2.0..

DenisVieriu97 and others added 4 commits February 14, 2023 07:15
* Test MPS CI runners

* Cherry pick remaining files

* Enable lintrunner:

* Change lint  runner

* Retrigger checks

* Retrigger checks #2

* Retrigger checks #3

* Retrigger checks #4

* Retrigger checks #5

* Retrigger checks #5

* Retrigger checks #7

* Retrigger checks #8

* Retrigger checks #9

* Retrigger checks #9 (change arch to arm)

* Retrigger checks #10

* Retrigger checks #11

* Retrigger checks #12

* Retrigger checks #13

* Retrigger checks #14

* Retrigger checks #14

* Retrigger checks #15

* Retrigger checks #16

* Retrigger checks #16

* Retrigger checks #17

* Retrigger checks #19

* Retrigger checks #20

* Retrigger checks #21

* Fix lintrunner

* Fix lintrunner

* Remove lint.json
* Use DISTRIBUTED=1 for MPS CI runners

* Disable openmp
razarmehr and others added 12 commits February 14, 2023 15:26
* Remove unnecessary CI files

* Additional files

* Update lint
* Enable test modules on MPS and CI runners

* Update lint.yml

* Update comments

* Retrigger CI

* Retrigger CI #2

* Remove comment
… 12. (#313) (#328)

* Block uint8 data type for unary and binary ops on macOS 12. (#313)

* fixes after cherry-pick

---------

Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>
* Fix bilinear backward pass

* Remove comment
* Update macOS 12 blocklist
- move sum, masked.var, mul to low precision list
- unblock them from running

* - mark __rdiv__ failures as accumulate error exceeds atol/rtol
- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated.
- Fixed bias tensor mistakenly getting overwritten to zeros
- Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag.

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
- add _mps_convolution_impl that takes optional shape
- for conv_tranpose2d grad, use the shape from input directly
- remove nn.functional.conv_transpose2d grad from blocklist

Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>
Fixes a crash where the inputTensor could go null and cause a crash.
- casting the input tensor to float32 and cast back the output tensor
- unblock the test
DenisVieriu97 and others added 5 commits February 15, 2023 21:27
- give warnings of converting int64 for reduction ops
- use cast tensor for reduction sum on trace
- unblock trace from running
* - move nn.functional.feature_alpha_dropoutwith_train, normalnumber_mean, new_empty_strided to expected failures

* - update new_empty_strided

---------

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
…ntiguous calls (#341)

* Fix convolution crash; remove unnecessary contiguous calls

* Fix lintrunner
This should fix the failure with GPT2 when use_cache=True
DenisVieriu97 and others added 5 commits February 18, 2023 10:27
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
* Handle broadcasting by expanding src tensor in Copy.mm

* Unblock linalg_matrix_power

* Improved formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants