New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP32 Autovec Optimization #2535
Open
NathanielIskandar
wants to merge
53
commits into
pytorch:main
Choose a base branch
from
codebase-berkeley:FP32-autovec
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NathanielIskandar
commented
Apr 24, 2024
- Verified that FP32 implementation passes unit tests
- Optimized FP32 Autovec Function (EmbeddingSpMDM_autovec):
- #pragma omp tile size (tile_size = 4)
- cache prefetching (max_initial_prefetch_row = 8)
- #pragma omp simd
- Implemented switching logic for usage of the autovec environment variable
…dingSpMDMAutovec.h
…eSparse_autovec()
…SPMDM_BASE from 'EmbeddingSpMDM_ref' to 'EmbeddingSpMDMRowWiseSparse_autovec'
…ng blocks in the autovec implementation because of input_stride error for now
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2 similar comments
@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Pull Request resolved: pytorch#2472 - Break out forward code generation from `codegen/genscript/generate_embedding_optimizer.py` Reviewed By: sryap Differential Revision: D55603000 fbshipit-source-id: 36bf8ae2e419500d32cdb8b941d5b645256c04b6
Summary: Pull Request resolved: pytorch#2473 As title Reviewed By: q10 Differential Revision: D55608855 fbshipit-source-id: d88aa44a122c38bc0440a6f6a33e195569abc04d
Summary: Pull Request resolved: pytorch#2474 - Break out forward code generation from `codegen/genscript/generate_embedding_optimizer.py` Reviewed By: spcyppt Differential Revision: D55617766 fbshipit-source-id: b6d0804ba0be950b45891bf0c409f2e83bedb710
Summary: Pull Request resolved: pytorch#2475 - Break out forward code generation from `codegen/embedding_backward_code_generator.py` Reviewed By: sryap, spcyppt Differential Revision: D55654970 fbshipit-source-id: 31cb7c79b6086cead95919bbacb9f12aadf46857
Summary: Pull Request resolved: pytorch#2476 - Remove `codegen/embedding_backward_code_generator.py` and `codegen/embedding_common_code_generator.py`, which are now deprecated in favor of `codegen/genscript/` Reviewed By: spcyppt Differential Revision: D55674855 fbshipit-source-id: 4bf6fd233c2a70b7b4aa3c6a391defbffc53aa19
Summary: Pull Request resolved: pytorch#2471 Previously, `ssd_cache_actions_insert_kernel` uses `atomicAdd` to count the number of SSD lookups (i.e., `actions_count`). However, since the cache sets are sorted in `sorted_cache_sets`, we can compute `actions_count` by finding the position of the last cache set that is not the sentinel value (i.e., the number of cache sets). Reviewed By: jianyuh Differential Revision: D55599052 fbshipit-source-id: b73b730ea00f1f1f214d57f3cb41a0d2ad6e843b
Summary: Pull Request resolved: pytorch#2478 - Migrate python files in `codegen/` over to `codegen/training/python` Reviewed By: spcyppt Differential Revision: D55721983 fbshipit-source-id: c6fed96a60090e52e6bfe7e53861db84dcb22471
Summary: Pull Request resolved: pytorch#2479 - Migrate optimizer templates in `codegen/` over to `codegen/training/optimizer` Reviewed By: spcyppt Differential Revision: D55762970 fbshipit-source-id: 14544d8e928594a453a6f569ac6f3dfcb508a9f1
Summary: Pull Request resolved: pytorch#2481 As title Reviewed By: SherlockNoMad Differential Revision: D55801981 fbshipit-source-id: c91f5fecefd0317cd785ee4b49c643952b86ac16
Summary: Pull Request resolved: pytorch#2480 - Migrate forward templates in `codegen/` over to `codegen/training/forward` Reviewed By: spcyppt Differential Revision: D55778877 fbshipit-source-id: 4424d4115e074f8d59da0b3ede45dd3666d04c71
Summary: Pull Request resolved: pytorch#2483 - Migrate backward templates in `codegen/` over to `codegen/training/backward` Reviewed By: spcyppt Differential Revision: D55825078 fbshipit-source-id: 5fdbce85b91cba2bb9b661154bbd52aa1d35a1fd
…`fbcode/deeplearning/fbgemm/fbgemm_gpu` (pytorch#2482) Summary: Pull Request resolved: pytorch#2482 Reviewed By: inseokhwang Differential Revision: D55817659 fbshipit-source-id: 5983e092d351d46b37deddc03f61a3fa3bb20633
Summary: Pull Request resolved: pytorch#2484 - Migrate other code in `codegen/` over to `codegen/training` Reviewed By: spcyppt Differential Revision: D55828101 fbshipit-source-id: 477acabc75687769374ecebd882ac659fa161100
Summary: Pull Request resolved: pytorch#2486 - Migrate headers in `codegen/` over to `include/` Reviewed By: spcyppt Differential Revision: D55896420 fbshipit-source-id: 5b374808bfc9df4b3bce6a30b96dcb47e20f8494
Summary: Pull Request resolved: pytorch#2488 SymInt-ify scheme of batch_index_select as it can be used via torchrec VB path with SymInt Reviewed By: ezyang Differential Revision: D55923134 fbshipit-source-id: b7b69d307c866cf39fad3e6e44a50e24295a0109
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.