Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303

DenisVieriu97 · 2023-02-09T04:48:53Z

Since NHWC is represented as a view operation in PyTorch, we can execute the convolution ops directly in NCHW if the suggested memory format is NHWC but the actual memory layout is still NCHW.
This skips over unnecessary steps, such as gather to materialize the memory from NCHW -> NHWC, then transposing back the output of the convolution to NCHW in the graph.

kulinseth · 2023-02-09T05:30:00Z

@DenisVieriu97 , needs extensive testing... I think we should merge this post PyT 2.0..

Fix the TEST_WITH_MPS macro.

* Test MPS CI runners * Cherry pick remaining files * Enable lintrunner: * Change lint runner * Retrigger checks * Retrigger checks #2 * Retrigger checks #3 * Retrigger checks #4 * Retrigger checks #5 * Retrigger checks #5 * Retrigger checks #7 * Retrigger checks #8 * Retrigger checks #9 * Retrigger checks #9 (change arch to arm) * Retrigger checks #10 * Retrigger checks #11 * Retrigger checks #12 * Retrigger checks #13 * Retrigger checks #14 * Retrigger checks #14 * Retrigger checks #15 * Retrigger checks #16 * Retrigger checks #16 * Retrigger checks #17 * Retrigger checks #19 * Retrigger checks #20 * Retrigger checks #21 * Fix lintrunner * Fix lintrunner * Remove lint.json

* Use DISTRIBUTED=1 for MPS CI runners * Disable openmp

* Remove unnecessary CI files * Additional files * Update lint

* Enable test modules on MPS and CI runners * Update lint.yml * Update comments * Retrigger CI * Retrigger CI #2 * Remove comment

… 12. (#313) (#328) * Block uint8 data type for unary and binary ops on macOS 12. (#313) * fixes after cherry-pick --------- Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>

* Fix bilinear backward pass * Remove comment

* Update macOS 12 blocklist - move sum, masked.var, mul to low precision list - unblock them from running * - mark __rdiv__ failures as accumulate error exceeds atol/rtol

- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated. - Fixed bias tensor mistakenly getting overwritten to zeros - Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag. Co-authored-by: Kulin Seth <kulin_seth@apple.com>

- add _mps_convolution_impl that takes optional shape - for conv_tranpose2d grad, use the shape from input directly - remove nn.functional.conv_transpose2d grad from blocklist Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>

Fixes a crash where the inputTensor could go null and cause a crash.

- casting the input tensor to float32 and cast back the output tensor - unblock the test

- unblock rdiv float16

* Fix upsample for NHWC output * Add testcase

- give warnings of converting int64 for reduction ops - use cast tensor for reduction sum on trace - unblock trace from running

* - move nn.functional.feature_alpha_dropoutwith_train, normalnumber_mean, new_empty_strided to expected failures * - update new_empty_strided --------- Co-authored-by: Kulin Seth <kulin_seth@apple.com>

…ntiguous calls (#341) * Fix convolution crash; remove unnecessary contiguous calls * Fix lintrunner

This should fix the failure with GPT2 when use_cache=True

Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>

* Handle broadcasting by expanding src tensor in Copy.mm * Unblock linalg_matrix_power * Improved formatting

…he actual mem layout is NCHW

DenisVieriu97 requested review from kulinseth, skotapati, razarmehr, Ronian526, ssaladis and shuhand0 February 9, 2023 04:48

DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from 086e3a5 to 4e74a86 Compare February 10, 2023 06:44

DenisVieriu97 and others added 4 commits February 14, 2023 07:15

Add back support for PYTORCH_TEST_WITH_MPS (#66)

23d334b

Fix the TEST_WITH_MPS macro.

Use DISTRIBUTED=1 for MPS CI runners (#292)

e8f89df

* Use DISTRIBUTED=1 for MPS CI runners * Disable openmp

Update the test mps.

85cdb98

kulinseth force-pushed the master branch from 57eda52 to 85cdb98 Compare February 14, 2023 16:24

razarmehr and others added 12 commits February 14, 2023 15:26

Remove torch._six from test_mps (#326)

66951a0

Remove unnecessary CI files (#327)

5ada241

* Remove unnecessary CI files * Additional files * Update lint

Enable test modules on MPS and CI runners (#305) (#324)

bf8eba9

* Enable test modules on MPS and CI runners * Update lint.yml * Update comments * Retrigger CI * Retrigger CI #2 * Remove comment

[CHERRY-PICK] Block uint8 data type for unary and binary ops on macOS…

2f336a4

… 12. (#313) (#328) * Block uint8 data type for unary and binary ops on macOS 12. (#313) * fixes after cherry-pick --------- Co-authored-by: Ronian526 <11454459+Ronian526@users.noreply.github.com>

Fix test_zero_grad() (#330)

108cdc0

Convert output back to ChannelsLast if needed (#325)

8de3315

Fix bilinear backward pass (#331)

051bc9c

* Fix bilinear backward pass * Remove comment

Update macOS 12 blocklist (#323)

1b09ea2

* Update macOS 12 blocklist - move sum, masked.var, mul to low precision list - unblock them from running * - mark __rdiv__ failures as accumulate error exceeds atol/rtol

Fix the crash in elu_backward() (#333)

2856203

Fixes a crash where the inputTensor could go null and cause a crash.

Fix nn.functional.embedding grad (#335)

18797b0

- casting the input tensor to float32 and cast back the output tensor - unblock the test

DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from 4e74a86 to b0c9dbb Compare February 15, 2023 21:17

DenisVieriu97 and others added 4 commits February 15, 2023 16:17

Fix prelu backward (#334)

cf06ac5

Reduction cast f16 to f32 only on macOS 12 (#332)

c65b823

- unblock rdiv float16

Remove periodic file (running between PRs) (#336)

73f7068

Fix upsample for NHWC output (#337)

1c8f126

* Fix upsample for NHWC output * Add testcase

DenisVieriu97 and others added 5 commits February 15, 2023 21:27

[DOWNSTREAM] Fix build failure on x86 runners (#338)

42be72a

Fix trace op (#340)

6ace5f9

- give warnings of converting int64 for reduction ops - use cast tensor for reduction sum on trace - unblock trace from running

Update random result list (#339)

c9b8ab7

* - move nn.functional.feature_alpha_dropoutwith_train, normalnumber_mean, new_empty_strided to expected failures * - update new_empty_strided --------- Co-authored-by: Kulin Seth <kulin_seth@apple.com>

Fix convolution crash in backward with weights; remove unnecessary co…

d3e414e

…ntiguous calls (#341) * Fix convolution crash; remove unnecessary contiguous calls * Fix lintrunner

Fix copy_cast_mps() on tensors with storage offset (#343)

be8817b

This should fix the failure with GPT2 when use_cache=True

DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from b0c9dbb to 6caefbc Compare February 18, 2023 02:34

DenisVieriu97 and others added 5 commits February 18, 2023 10:27

Enable int8 in TestConsistency (#347)

8e37116

Convolution cleanup (#346)

c30946a

Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>

Dev/skotapati/copy broadcasting (#350)

b520970

* Handle broadcasting by expanding src tensor in Copy.mm * Unblock linalg_matrix_power * Improved formatting

Execute convolution in NCHW if the suggested mem format is NHWC but t…

0473fe8

…he actual mem layout is NCHW

Fix build failure

9a5b002

DenisVieriu97 force-pushed the dev/denis/conv_nchw_optimization branch from 4d2ba19 to 9a5b002 Compare February 21, 2023 23:28

kulinseth force-pushed the master branch 2 times, most recently from 298b2d6 to 606362d Compare February 28, 2023 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303

DenisVieriu97 commented Feb 9, 2023 •

edited

kulinseth commented Feb 9, 2023

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303

Are you sure you want to change the base?

Execute convolution in NCHW if the suggested mem format is NHWC but the actual mem layout is NCHW #303

Conversation

DenisVieriu97 commented Feb 9, 2023 • edited

kulinseth commented Feb 9, 2023

DenisVieriu97 commented Feb 9, 2023 •

edited