FP32 Autovec Optimization #2535

NathanielIskandar · 2024-04-24T00:54:34Z

Verified that FP32 implementation passes unit tests
Optimized FP32 Autovec Function (EmbeddingSpMDM_autovec):
- #pragma omp tile size (tile_size = 4)
- cache prefetching (max_initial_prefetch_row = 8)
- #pragma omp simd
Implemented switching logic for usage of the autovec environment variable

…dingSpMDMAutovec.h

…Sparse_autovec

…eSparse_autovec()

…SPMDM_BASE from 'EmbeddingSpMDM_ref' to 'EmbeddingSpMDMRowWiseSparse_autovec'

…ng blocks in the autovec implementation because of input_stride error for now

netlify · 2024-04-24T00:54:51Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`57b1793`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/662a999cd11f710008d00e3d
😎 Deploy Preview	https://deploy-preview-2535--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-04-24T01:08:46Z

@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-04-24T02:00:03Z

@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-04-24T18:41:06Z

@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Pull Request resolved: pytorch#2472 - Break out forward code generation from `codegen/genscript/generate_embedding_optimizer.py` Reviewed By: sryap Differential Revision: D55603000 fbshipit-source-id: 36bf8ae2e419500d32cdb8b941d5b645256c04b6

Summary: Pull Request resolved: pytorch#2473 As title Reviewed By: q10 Differential Revision: D55608855 fbshipit-source-id: d88aa44a122c38bc0440a6f6a33e195569abc04d

Summary: Pull Request resolved: pytorch#2474 - Break out forward code generation from `codegen/genscript/generate_embedding_optimizer.py` Reviewed By: spcyppt Differential Revision: D55617766 fbshipit-source-id: b6d0804ba0be950b45891bf0c409f2e83bedb710

Summary: Pull Request resolved: pytorch#2475 - Break out forward code generation from `codegen/embedding_backward_code_generator.py` Reviewed By: sryap, spcyppt Differential Revision: D55654970 fbshipit-source-id: 31cb7c79b6086cead95919bbacb9f12aadf46857

Summary: Pull Request resolved: pytorch#2476 - Remove `codegen/embedding_backward_code_generator.py` and `codegen/embedding_common_code_generator.py`, which are now deprecated in favor of `codegen/genscript/` Reviewed By: spcyppt Differential Revision: D55674855 fbshipit-source-id: 4bf6fd233c2a70b7b4aa3c6a391defbffc53aa19

Summary: Pull Request resolved: pytorch#2471 Previously, `ssd_cache_actions_insert_kernel` uses `atomicAdd` to count the number of SSD lookups (i.e., `actions_count`). However, since the cache sets are sorted in `sorted_cache_sets`, we can compute `actions_count` by finding the position of the last cache set that is not the sentinel value (i.e., the number of cache sets). Reviewed By: jianyuh Differential Revision: D55599052 fbshipit-source-id: b73b730ea00f1f1f214d57f3cb41a0d2ad6e843b

Summary: Pull Request resolved: pytorch#2478 - Migrate python files in `codegen/` over to `codegen/training/python` Reviewed By: spcyppt Differential Revision: D55721983 fbshipit-source-id: c6fed96a60090e52e6bfe7e53861db84dcb22471

Summary: Pull Request resolved: pytorch#2479 - Migrate optimizer templates in `codegen/` over to `codegen/training/optimizer` Reviewed By: spcyppt Differential Revision: D55762970 fbshipit-source-id: 14544d8e928594a453a6f569ac6f3dfcb508a9f1

Summary: Pull Request resolved: pytorch#2481 As title Reviewed By: SherlockNoMad Differential Revision: D55801981 fbshipit-source-id: c91f5fecefd0317cd785ee4b49c643952b86ac16

Summary: Pull Request resolved: pytorch#2480 - Migrate forward templates in `codegen/` over to `codegen/training/forward` Reviewed By: spcyppt Differential Revision: D55778877 fbshipit-source-id: 4424d4115e074f8d59da0b3ede45dd3666d04c71

Summary: Pull Request resolved: pytorch#2483 - Migrate backward templates in `codegen/` over to `codegen/training/backward` Reviewed By: spcyppt Differential Revision: D55825078 fbshipit-source-id: 5fdbce85b91cba2bb9b661154bbd52aa1d35a1fd

…`fbcode/deeplearning/fbgemm/fbgemm_gpu` (pytorch#2482) Summary: Pull Request resolved: pytorch#2482 Reviewed By: inseokhwang Differential Revision: D55817659 fbshipit-source-id: 5983e092d351d46b37deddc03f61a3fa3bb20633

Summary: Pull Request resolved: pytorch#2484 - Migrate other code in `codegen/` over to `codegen/training` Reviewed By: spcyppt Differential Revision: D55828101 fbshipit-source-id: 477acabc75687769374ecebd882ac659fa161100

Summary: Pull Request resolved: pytorch#2486 - Migrate headers in `codegen/` over to `include/` Reviewed By: spcyppt Differential Revision: D55896420 fbshipit-source-id: 5b374808bfc9df4b3bce6a30b96dcb47e20f8494

Summary: Pull Request resolved: pytorch#2488 SymInt-ify scheme of batch_index_select as it can be used via torchrec VB path with SymInt Reviewed By: ezyang Differential Revision: D55923134 fbshipit-source-id: b7b69d307c866cf39fad3e6e44a50e24295a0109

crystalrchen and others added 30 commits March 17, 2024 17:18

fp32 autovec

ea06cfd

fix EmbeddingSpMDM8BitTest error

10b4ca5

Delete tall fbgemm cpu

1f29f7b

Delete fbgemm_gpu/fbgemm_gpu/docs/version.py

8fa19e7

Delete miniconda.sh

925d70d

Merge branch 'pytorch:main' into FP32-autovec

d7ab67a

Declared EmbeddingSpMDMRowWiseSparse_autovec in the header file Embed…

8cddf9e

…dingSpMDMAutovec.h

Implemented template function instantiation for EmbeddingSpMDMRowWise…

cbffd52

…Sparse_autovec

Added a new set of INSTANTIATE_SPMDM_ macros for EmbeddingSpMDMRowWis…

bfb1be3

…eSparse_autovec()

Replaced the function template declaration under #define INSTANTIATE_…

f728002

…SPMDM_BASE from 'EmbeddingSpMDM_ref' to 'EmbeddingSpMDMRowWiseSparse_autovec'

Added 'std::' for 'is_same' statements // Commented out the prefetchi…

6728f7e

…ng blocks in the autovec implementation because of input_stride error for now

Fixed instantiation macros template arguments

005ac41

Debugging sesh

57d7841

new fixes

c702eaf

Commented out pytorch library dependencies as Sarunya requested

c9a103b

add test file

d7852fb

make changes in autovec

8ac5a19

Added switching based on fp8-switching branch

d5ad0c8

Added EmbeddingSpMDM_autovec

2252403

debug nbit and autovec

c00d03b

changes

a87f4ee

remove print autovec statements

b7648c7

Autovec: added '#pragma omp simd' and '#pragma omp tile size'

30b281f

Co-authored-by: Dylan Nguyen <TTTechnoPro@users.noreply.github.com>

3657df7

Added print for checking purposes

e6f7b3c

add prefetching

8942b1f

add more updated prefetching code

bff76f3

cleanup; isolation test for pragma omp simd

deea87e

Isolation performance test: #pragma omp tile size

09ac8da

isolate tile size 8

d0f0ff2

Nathaniel Iskandar added 8 commits April 23, 2024 15:47

changed tile size to 16; run unit test

df6262b

testing prefetching with 16 rows

3da52bb

prefetch testing with rows=2

b3ad003

prefetching with row=8

8ab9636

prefetching rows=20

2ca4570

prefetching rows=48

3826651

prefetching rows 64

b611950

Performance test: All optimization techniques utilized

9e739da

facebook-github-bot added the cla signed label Apr 24, 2024

q10 and others added 15 commits April 25, 2024 10:57

Fix TBE's generate_requests for Zipf when L = 0 (pytorch#2473)

c442e8e

Summary: Pull Request resolved: pytorch#2473 As title Reviewed By: q10 Differential Revision: D55608855 fbshipit-source-id: d88aa44a122c38bc0440a6f6a33e195569abc04d

Refactor the codegen directory, pt 9 (pytorch#2478)

8e24642

Summary: Pull Request resolved: pytorch#2478 - Migrate python files in `codegen/` over to `codegen/training/python` Reviewed By: spcyppt Differential Revision: D55721983 fbshipit-source-id: c6fed96a60090e52e6bfe7e53861db84dcb22471

Skip check_all_same_device if tensors are on CPU (pytorch#2481)

fcacef8

Summary: Pull Request resolved: pytorch#2481 As title Reviewed By: SherlockNoMad Differential Revision: D55801981 fbshipit-source-id: c91f5fecefd0317cd785ee4b49c643952b86ac16

Refactor the codegen directory, pt 13 (pytorch#2484)

0af19dc

Summary: Pull Request resolved: pytorch#2484 - Migrate other code in `codegen/` over to `codegen/training` Reviewed By: spcyppt Differential Revision: D55828101 fbshipit-source-id: 477acabc75687769374ecebd882ac659fa161100

Refactor the codegen directory, pt 14 (pytorch#2486)

908e295

Summary: Pull Request resolved: pytorch#2486 - Migrate headers in `codegen/` over to `include/` Reviewed By: spcyppt Differential Revision: D55896420 fbshipit-source-id: 5b374808bfc9df4b3bce6a30b96dcb47e20f8494

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP32 Autovec Optimization #2535

FP32 Autovec Optimization #2535

NathanielIskandar commented Apr 24, 2024

netlify bot commented Apr 24, 2024 •

edited

facebook-github-bot commented Apr 24, 2024

facebook-github-bot commented Apr 24, 2024

facebook-github-bot commented Apr 24, 2024

FP32 Autovec Optimization #2535

Are you sure you want to change the base?

FP32 Autovec Optimization #2535

Conversation

NathanielIskandar commented Apr 24, 2024

netlify bot commented Apr 24, 2024 • edited

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Apr 24, 2024

facebook-github-bot commented Apr 24, 2024

facebook-github-bot commented Apr 24, 2024

netlify bot commented Apr 24, 2024 •

edited