Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP32 Autovec Optimization #2535

Open
wants to merge 53 commits into
base: main
Choose a base branch
from

Conversation

NathanielIskandar
Copy link

  1. Verified that FP32 implementation passes unit tests
  2. Optimized FP32 Autovec Function (EmbeddingSpMDM_autovec):
    • #pragma omp tile size (tile_size = 4)
    • cache prefetching (max_initial_prefetch_row = 8)
    • #pragma omp simd
  3. Implemented switching logic for usage of the autovec environment variable

crystalrchen and others added 30 commits March 17, 2024 17:18
…SPMDM_BASE from 'EmbeddingSpMDM_ref' to 'EmbeddingSpMDMRowWiseSparse_autovec'
…ng blocks in the autovec implementation because of input_stride error for now
Copy link

netlify bot commented Apr 24, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 57b1793
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/662a999cd11f710008d00e3d
😎 Deploy Preview https://deploy-preview-2535--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@sryap has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

q10 and others added 15 commits April 25, 2024 10:57
Summary:
Pull Request resolved: pytorch#2472

- Break out forward code generation from `codegen/genscript/generate_embedding_optimizer.py`

Reviewed By: sryap

Differential Revision: D55603000

fbshipit-source-id: 36bf8ae2e419500d32cdb8b941d5b645256c04b6
Summary:
Pull Request resolved: pytorch#2473

As title

Reviewed By: q10

Differential Revision: D55608855

fbshipit-source-id: d88aa44a122c38bc0440a6f6a33e195569abc04d
Summary:
Pull Request resolved: pytorch#2474

- Break out forward code generation from `codegen/genscript/generate_embedding_optimizer.py`

Reviewed By: spcyppt

Differential Revision: D55617766

fbshipit-source-id: b6d0804ba0be950b45891bf0c409f2e83bedb710
Summary:
Pull Request resolved: pytorch#2475

- Break out forward code generation from `codegen/embedding_backward_code_generator.py`

Reviewed By: sryap, spcyppt

Differential Revision: D55654970

fbshipit-source-id: 31cb7c79b6086cead95919bbacb9f12aadf46857
Summary:
Pull Request resolved: pytorch#2476

- Remove `codegen/embedding_backward_code_generator.py` and `codegen/embedding_common_code_generator.py`,
which are now deprecated in favor of `codegen/genscript/`

Reviewed By: spcyppt

Differential Revision: D55674855

fbshipit-source-id: 4bf6fd233c2a70b7b4aa3c6a391defbffc53aa19
Summary:
Pull Request resolved: pytorch#2471

Previously, `ssd_cache_actions_insert_kernel` uses `atomicAdd` to
count the number of SSD lookups (i.e., `actions_count`).  However,
since the cache sets are sorted in `sorted_cache_sets`, we can compute
`actions_count` by finding the position of the last cache set that is
not the sentinel value (i.e., the number of cache sets).

Reviewed By: jianyuh

Differential Revision: D55599052

fbshipit-source-id: b73b730ea00f1f1f214d57f3cb41a0d2ad6e843b
Summary:
Pull Request resolved: pytorch#2478

- Migrate python files in `codegen/` over to `codegen/training/python`

Reviewed By: spcyppt

Differential Revision: D55721983

fbshipit-source-id: c6fed96a60090e52e6bfe7e53861db84dcb22471
Summary:
Pull Request resolved: pytorch#2479

- Migrate optimizer templates in `codegen/` over to `codegen/training/optimizer`

Reviewed By: spcyppt

Differential Revision: D55762970

fbshipit-source-id: 14544d8e928594a453a6f569ac6f3dfcb508a9f1
Summary:
Pull Request resolved: pytorch#2481

As title

Reviewed By: SherlockNoMad

Differential Revision: D55801981

fbshipit-source-id: c91f5fecefd0317cd785ee4b49c643952b86ac16
Summary:
Pull Request resolved: pytorch#2480

- Migrate forward templates in `codegen/` over to `codegen/training/forward`

Reviewed By: spcyppt

Differential Revision: D55778877

fbshipit-source-id: 4424d4115e074f8d59da0b3ede45dd3666d04c71
Summary:
Pull Request resolved: pytorch#2483

- Migrate backward  templates in `codegen/` over to `codegen/training/backward`

Reviewed By: spcyppt

Differential Revision: D55825078

fbshipit-source-id: 5fdbce85b91cba2bb9b661154bbd52aa1d35a1fd
…`fbcode/deeplearning/fbgemm/fbgemm_gpu` (pytorch#2482)

Summary: Pull Request resolved: pytorch#2482

Reviewed By: inseokhwang

Differential Revision: D55817659

fbshipit-source-id: 5983e092d351d46b37deddc03f61a3fa3bb20633
Summary:
Pull Request resolved: pytorch#2484

- Migrate other code in `codegen/` over to `codegen/training`

Reviewed By: spcyppt

Differential Revision: D55828101

fbshipit-source-id: 477acabc75687769374ecebd882ac659fa161100
Summary:
Pull Request resolved: pytorch#2486

- Migrate headers in `codegen/` over to `include/`

Reviewed By: spcyppt

Differential Revision: D55896420

fbshipit-source-id: 5b374808bfc9df4b3bce6a30b96dcb47e20f8494
Summary:
Pull Request resolved: pytorch#2488

SymInt-ify scheme of batch_index_select as it can be used via torchrec VB path with SymInt

Reviewed By: ezyang

Differential Revision: D55923134

fbshipit-source-id: b7b69d307c866cf39fad3e6e44a50e24295a0109
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants