Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass #48209

RichardWooSJTU · 2022-11-21T11:57:33Z

PR types

Others

PR changes

Others

Describe

PR #45907 and PR #47541 finshed fusing fp16/32 GPT3 model to a single fused_multi_transformer op, which can speed up inference of GPT3. In this PR, we add int8 support, the operations added is as follows:

Add delete_weight_quant_dequant_linear_op_encoder/decoder_pass: For models with new quantization format, quant/dequant_linear op will be used to fakely quantize weight, this pass can delete these nodes and move quantization info to nodes following them:

==》
Add enable_int8 attr of graph to distinguish int8 op pattern from fp16/32 pattern in fused_multi_transformer_encoder/decoer pass
Add calculation of input scale and output scale and transpose operation of int8 weight (which is needed by cublasLt imma layout)
Add some optimization of fused_multi_transformer_int8 op

This reverts commit 4f068e8. revert a commit which has been merged :

paddle-bot · 2022-11-21T11:57:37Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

MARD1NO · 2022-11-22T03:10:38Z

paddle/fluid/operators/fused/cublaslt.h

 #else
-                                 nullptr,
+                                 hshshsh


这里是不是改错了

收到测试代码忘记删除了

MARD1NO · 2022-11-22T03:17:34Z

paddle/fluid/operators/fused/fused_layernorm_residual_dropout_bias.h

-                                    quant_last_in_scale /
-                                    dequant_out_scale[it][jt]) +
+                                    dequant_out_scale[it][jt]
+                                    // quant_last_in_scale /


要不直接删了

chenwhql

麻烦再提交一个PR完善报错信息

qingqing01

But need to unify delete_weight_dequant_linear_op pass

XiaoguangHu01

LGTM

…former_layer_pass (PaddlePaddle#48209) * delete unnecessary shape and slice op Co-authored-by: Your Name <you@example.com>

RichardWooSJTU and others added 8 commits November 17, 2022 21:36

delete unnecessary shape and slice op

4f068e8

add delete_weight_dequant_linear_op pass

8e3efa2

use multi inputs for out_scales in fused_multi_transformer

6bb4611

add int8 support for fuse_mt pass

73fbd9d

optimiz int8 gemm

bfeefc9

optimize gemm int8

e6e9bf6

Revert "delete unnecessary shape and slice op"

c90aa16

This reverts commit 4f068e8. revert a commit which has been merged :

merge develop

aa16466

match quant/dequant scale

8082bfb

RichardWooSJTU requested review from heavengate, qingqing01, minghaoBD and wanghaoshuang November 22, 2022 03:05

resolve conflict of auto parallel

047d965

MARD1NO reviewed Nov 22, 2022

View reviewed changes

RichardWooSJTU added 10 commits November 22, 2022 11:37

delete debug codes

10b533e

fix unittest

837e72f

fix ci build error

d319449

fix scale calc

202a7fe

fit auto parallel

f3e245e

code clean

22d450e

merge develop

3bfa068

fix ci-inference build error

d2a9b0d

fix unittest

7b3307c

clean unnecessary notes

59c1182

chenwhql approved these changes Nov 29, 2022

View reviewed changes

heavengate approved these changes Nov 29, 2022

View reviewed changes

qingqing01 approved these changes Nov 29, 2022

View reviewed changes

XiaoguangHu01 approved these changes Nov 30, 2022

View reviewed changes

heavengate merged commit 1248671 into PaddlePaddle:develop Nov 30, 2022

This was referenced Dec 1, 2022

Fix and rename delete_weight_dequant_linear_op_encoder/decoder pass #48631

Closed

rewrite delete_weight_dequant_linear_op_encoder/decoder pass #48650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass #48209

Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass #48209

RichardWooSJTU commented Nov 21, 2022

paddle-bot bot commented Nov 21, 2022

MARD1NO Nov 22, 2022

RichardWooSJTU Nov 22, 2022

MARD1NO Nov 22, 2022

RichardWooSJTU Nov 22, 2022

chenwhql left a comment

qingqing01 left a comment

XiaoguangHu01 left a comment

Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass #48209

Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass #48209

Conversation

RichardWooSJTU commented Nov 21, 2022

PR types

PR changes

Describe

paddle-bot bot commented Nov 21, 2022

MARD1NO Nov 22, 2022

Choose a reason for hiding this comment

RichardWooSJTU Nov 22, 2022

Choose a reason for hiding this comment

MARD1NO Nov 22, 2022

Choose a reason for hiding this comment

RichardWooSJTU Nov 22, 2022

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment