feat: Implement FP8 functionality #2763

peri044 · 2024-04-18T23:14:58Z

Description

This PR adds FP8 & BF16 datatype support. It also implements converter for FP8 quantized ops.

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

chore: updates to trt api chore: trt 10 fixes chore: more fixes

author Dheeraj Peri <peri.dheeraj@gmail.com> 1711393059 -0700 committer Dheeraj Peri <peri.dheeraj@gmail.com> 1711393072 -0700 chore: minor updates chore: Fix save failures chore: minor fixes chore: remove duplicate bert test case chore: remove comments chore: add load api chore: minor updates chore: minor updates chore: minor updates chore: more updates

zewenli98 · 2024-05-17T21:12:08Z

@peri044 I remember we've already removed cudnn dependency on the release/2.3, but it still stays here:

TensorRT/py/torch_tensorrt/__init__.py

Line 9 in 3f6999d

__cudnn_version__,

This causes an error when I import torch-trt:

>>> import torch_tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/__init__.py", line 7, in <module>
    from torch_tensorrt._version import (  # noqa: F401
ImportError: cannot import name '__cudnn_version__' from 'torch_tensorrt._version' (/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/_version.py)

Can you take a look?

peri044 · 2024-05-17T21:49:38Z

@peri044 I remember we've already removed cudnn dependency on the release/2.3, but it still stays here:

TensorRT/py/torch_tensorrt/__init__.py

Line 9 in 3f6999d

__cudnn_version__,

This causes an error when I import torch-trt:
>>> import torch_tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/__init__.py", line 7, in <module>
    from torch_tensorrt._version import (  # noqa: F401
ImportError: cannot import name '__cudnn_version__' from 'torch_tensorrt._version' (/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/_version.py)
Can you take a look?

fixed it now

zewenli98 · 2024-05-17T21:51:22Z

Cool thanks! And did you implement the unit test for torch.ops.trt.quantize_fp8.default?

examples/int8/training/vgg16/main.py

zewenli98 · 2024-05-21T23:34:21Z

@peri044 Thanks for the comments. I have refactored based on your suggestions.

examples/dynamo/vgg16_fp8_ptq.py

gs-olive

Overall looks good! Added a few comments

py/torch_tensorrt/dynamo/conversion/impl/quantize.py

py/torch_tensorrt/dynamo/lowering/_remove_sym_nodes.py

py/torch_tensorrt/dynamo/lowering/passes/remove_detach.py

py/torch_tensorrt/_enums.py

py/torch_tensorrt/dynamo/_defaults.py

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py

peri044 added 30 commits March 12, 2024 02:11

chore: Upgrade to TRT 10.0

9ad87ac

chore: updates to trt api

a655c9a

feat: Add save API for torch-trt compiled models

cd86660

feat: Add FP8 support including dtype and converters

31285e5

chore: minor fixes

7c9c646

Merge branch 'main' into trt_10

4eabeb0

Merge branch 'trt_10' into fp8_trt10

a320e56

chore: resolve merge conflicts

3ece71b

chore: Fix save failures

eab0dba

chore: update to 2.3 rc build

b191d62

chore: rebase with release/2.3 branch

ce606fe

chore: minor fixes

8674a3c

chore: remove duplicate bert test case

f4e8fe9

chore: remove comments

4ae6ab9

chore: Upgrade to TRT 10.0

fff1b80

chore: updates to trt api chore: trt 10 fixes chore: more fixes

chore: more fixes

39ca77d

chore: update trt version

5431ee3

chore: more updates

0c03de5

chore: more updates

1ae46e9

chore: rebase with save

ae87fba

chore: Update versions

beb5920

chore: update tensorrt version in CI

f0068c6

chore: more updates

39261b9

chore: more fixes

3753150

Merge branch 'release/2.3' into trt_10

16a191c

chore: remove NvUtils.h

c355766

chore: more updates

2d237dc

chore: change lib64 to lib in rhel BUILD file

e4b4429

chore: more updates

fa4fb9c

peri044 added 4 commits May 16, 2024 09:24

chore: updates

2f167c6

chore: updates

367eaf0

chore: updates

8cb6b91

chore: updates

4d38368

peri044 requested review from narendasan, gs-olive and zewenli98 May 16, 2024 23:32

peri044 added 5 commits May 16, 2024 17:16

chore: updates

ee54da6

chore: updates

f4ccd62

chore: fixes

681a6d1

chore: updates

44071aa

chore: updates

3f6999d

chore: updates

5de9325

peri044 commented May 20, 2024

View reviewed changes

examples/int8/training/vgg16/main.py Outdated Show resolved Hide resolved

refactor vgg16 with fp8 and ptq example

c677ef9

zewenli98 force-pushed the fp8_trt10 branch from ea1053f to c677ef9 Compare May 21, 2024 23:20

github-actions bot added the documentation Improvements or additions to documentation label May 21, 2024

peri044 commented May 22, 2024

View reviewed changes

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

zewenli98 and others added 2 commits May 22, 2024 15:27

fix bugs

f0b8d47

chore: rebase

3ce9bed

gs-olive reviewed May 23, 2024

View reviewed changes

chore: updates

beb888d

narendasan reviewed May 23, 2024

View reviewed changes

py/torch_tensorrt/_enums.py Show resolved Hide resolved

py/torch_tensorrt/dynamo/_defaults.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated Show resolved Hide resolved

peri044 added 3 commits May 23, 2024 12:14

chore: address review comments

e7989a0

chore: updates

96fd462

chore: updates

4030344

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement FP8 functionality #2763

feat: Implement FP8 functionality #2763

peri044 commented Apr 18, 2024

zewenli98 commented May 17, 2024

peri044 commented May 17, 2024 •

edited

zewenli98 commented May 17, 2024

zewenli98 commented May 21, 2024

gs-olive left a comment

feat: Implement FP8 functionality #2763

Are you sure you want to change the base?

feat: Implement FP8 functionality #2763

Conversation

peri044 commented Apr 18, 2024

Description

Type of change

Checklist:

zewenli98 commented May 17, 2024

peri044 commented May 17, 2024 • edited

zewenli98 commented May 17, 2024

zewenli98 commented May 21, 2024

gs-olive left a comment

Choose a reason for hiding this comment

peri044 commented May 17, 2024 •

edited