Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass #48209

Merged
merged 20 commits into from Nov 30, 2022

Conversation

RichardWooSJTU
Copy link
Contributor

PR types

Others

PR changes

Others

Describe

PR #45907 and PR #47541 finshed fusing fp16/32 GPT3 model to a single fused_multi_transformer op, which can speed up inference of GPT3. In this PR, we add int8 support, the operations added is as follows:

  1. Add delete_weight_quant_dequant_linear_op_encoder/decoder_pass: For models with new quantization format, quant/dequant_linear op will be used to fakely quantize weight, this pass can delete these nodes and move quantization info to nodes following them:
    image
    ==》
    image
  2. Add enable_int8 attr of graph to distinguish int8 op pattern from fp16/32 pattern in fused_multi_transformer_encoder/decoer pass
  3. Add calculation of input scale and output scale and transpose operation of int8 weight (which is needed by cublasLt imma layout)
  4. Add some optimization of fused_multi_transformer_int8 op

@paddle-bot
Copy link

paddle-bot bot commented Nov 21, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

#else
nullptr,
hshshsh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是改错了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到 测试代码 忘记删除了

quant_last_in_scale /
dequant_out_scale[it][jt]) +
dequant_out_scale[it][jt]
// quant_last_in_scale /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不直接删了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done~

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

麻烦再提交一个PR完善报错信息

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But need to unify delete_weight_dequant_linear_op pass

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heavengate heavengate merged commit 1248671 into PaddlePaddle:develop Nov 30, 2022
lxsbupt pushed a commit to lxsbupt/Paddle that referenced this pull request Dec 17, 2022
…former_layer_pass (PaddlePaddle#48209)

* delete unnecessary shape and slice op

Co-authored-by: Your Name <you@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants