[Hackathon No.28] implement logcumsumexp #42267

tiancaishaonvjituizi · 2022-04-26T06:38:26Z

PR types

New features

PR changes

APIs

Describe

实现 logcumsumexp

hackathon issue 链接：#40305
hackathon PR 链接：PaddlePaddle/community#82

paddle-bot-old · 2022-04-26T06:38:29Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

tiancaishaonvjituizi · 2022-04-26T06:39:43Z

paddle/phi/kernels/cpu/cum_kernel.cc

+  auto reducer = Reducer();
+  ScanKernel<T, Context, Reducer>(
+      dev_ctx, x, axis, flatten, exclusive, reverse, reducer, out);
+}


本文件从这一行往上的内容是从 cumsum_kernel.cc 移动过来的，增加了 Reducer 参数

tiancaishaonvjituizi · 2022-04-26T06:40:50Z

paddle/phi/kernels/cpu/cum_kernel.cc

+                         ma);
+    return cmp_lt(ma, Eigen::NumTraits<T>::lowest()) ? initialize() : logsumexp;
+  }
+};


LogSumExp 和 LogSumExpReducer 是搬运自 tensorflow： https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/scan_ops.h#L83

tiancaishaonvjituizi · 2022-04-26T06:42:44Z

paddle/phi/kernels/gpu/cum_kernel.cu

+};
+
+template <typename T>
+struct Identity<T, LogAddExp> {


二元运算 LogAddExp 所形成的幺半群的幺元

tiancaishaonvjituizi · 2022-04-26T06:43:48Z

paddle/phi/kernels/gpu/cum_kernel.cu

-    }
-    return;
-  }
-


thrust 的实现删掉了，因为 cub 的介绍显示它的 prefix scan 比 thrust 快很多，tf 也没有用 thrust

建议本PR不要动已有算子的实现部分，可能存在未知的精度/性能问题。而且RFC文档里写对其他模块没有影响（包括正负影响）。

可直接单独提一个PR来修改，通过回归测试CE来验证Op性能和模型精度。

@tiancaishaonvjituizi 可以先行单独提一个PR来提升cumsum的速度，并欢迎报名参加【PFCC-Roadmap】算子性能优化活动，详见 #42286

tiancaishaonvjituizi · 2022-04-26T06:49:09Z

paddle/phi/infermeta/unary.cc

-                     bool exclusive,
-                     bool reverse,
-                     MetaTensor* out) {
+void CumInferMeta(const MetaTensor& x,


Cumsum 和 Logcumsumexp 复用同一个 infer meta 函数

tiancaishaonvjituizi · 2022-04-26T06:50:58Z

python/paddle/fluid/tests/unittests/test_logcumsumexp_op.py

+    return x
+
+
+class TestLogcumsumexpOp(unittest.TestCase):


目前没有检查梯度，因为 import OpTest 时报错：

Traceback (most recent call last): File "test_logcumsumexp_op.py", line 24, in <module> from op_test import OpTest File "/home/dev/files/repos/Paddle2/python/paddle/fluid/tests/unittests/op_test.py", line 40, in <module> from paddle.fluid.tests.unittests.testsuite import ( ModuleNotFoundError: No module named 'paddle.fluid.tests'

是不是我哪里使用方式不对呢，也没有看到相关的文档

请问是用什么方式运行单测呢，是build目录下，ctest -R test_logcumsumexp_op 来运行的么？https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/new_python_api_cn.html#yunxingdanyuanceshi

赞，之前没有找到这个文档

请问是用什么方式运行单测呢，是build目录下，ctest -R test_logcumsumexp_op 来运行的么？https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/new_python_api_cn.html#yunxingdanyuanceshi

@luotao1 FYI，这个文档有几个关键链接是错误的😂，paddle 对文档的上心程度可能需要提升一下。我提交了 PR 在 PaddlePaddle/docs#4742 。本 PR 我接下来就继续更新

paddle-bot-old · 2022-05-06T02:49:06Z

PR格式检查通过，你的PR将接受Paddle专家以及开源社区的review，请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

luotao1

没有review完，明天继续

paddle/utils/variant.h

paddle/fluid/operators/cum_op.cc

luotao1 · 2022-05-06T10:35:23Z

python/paddle/fluid/tests/unittests/test_logcumsumexp_op.py

+    return x
+
+
+class TestLogcumsumexpOp(unittest.TestCase):


请问是用什么方式运行单测呢，是build目录下，ctest -R test_logcumsumexp_op 来运行的么？https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/new_python_api_cn.html#yunxingdanyuanceshi

paddle/phi/kernels/cpu/cum_kernel.cc

luotao1 · 2022-05-06T14:04:21Z

paddle/phi/kernels/gpu/cum_kernel.cu

-    }
-    return;
-  }
-


建议本PR不要动已有算子的实现部分，可能存在未知的精度/性能问题。而且RFC文档里写对其他模块没有影响（包括正负影响）。

可直接单独提一个PR来修改，通过回归测试CE来验证Op性能和模型精度。

python/paddle/fluid/tests/unittests/test_logcumsumexp_op.py

paddle/phi/kernels/gpu/cum_kernel.cu

python/paddle/tensor/math.py

tiancaishaonvjituizi · 2022-05-09T08:22:10Z

@luotao1 单测已补充，但梯度检查因为精度原因过不了，有什么办法稍微调大阈值吗，check_grad 的max_relative_error 参数在函数内部被强行覆盖了，所以设置它没有效果

paddle/fluid/operators/cum_op.cc

python/paddle/tensor/math.py

luotao1 · 2022-05-10T08:43:09Z

单测已补充，但梯度检查因为精度原因过不了，有什么办法稍微调大阈值吗，check_grad 的max_relative_error 参数在函数内部被强行覆盖了，所以设置它没有效果

Paddle/python/paddle/fluid/tests/unittests/test_row_conv_op.py

Lines 100 to 105 in 69d8027

def test_check_grad_normal(self):

self.check_grad(

['X', 'Filter'],

'Out',

max_relative_error=0.06,

check_dygraph=False)

我们内部有很多使用max_relative_error，可以搜索一下

tiancaishaonvjituizi · 2022-05-12T12:25:50Z

我们内部有很多使用max_relative_error，可以搜索一下

@luotao1

那是我需要把 logcumsumexp 加入到 NEED_FIX_FP64_CHECK_GRAD_THRESHOLD_OP_LIST 里吗，从名字来看我以为这是一个临时的补丁（NEED_FIX）

Paddle/python/paddle/fluid/tests/unittests/op_test.py

Lines 1867 to 1870 in 91cf770

    
           if self.dtype == np.float64 and \ 
        
               self.op_type not in op_threshold_white_list.NEED_FIX_FP64_CHECK_GRAD_THRESHOLD_OP_LIST: 
        
               numeric_grad_delta = 1e-5 
        
               max_relative_error = 1e-7

此外，这里在没有任何提示的情况下强行覆盖了 numeric_grad_delta 和 max_relative_error，在用户不知晓的情况下违反了用户的意图，我认为是不应该提倡的。如果可以的话，我再提一个 PR 把这里的行为改为抛出一个异常

luotao1 · 2022-05-12T12:57:16Z

NEED_FIX_FP64_CHECK_GRAD_THRESHOLD_OP_LIST

确实是一个临时的补丁（里面的存量问题会修）

这里在没有任何提示的情况下强行覆盖了 numeric_grad_delta 和 max_relative_error，在用户不知晓的情况下违反了用户的意图，我认为是不应该提倡的

这是Op精度规范的要求：单测精度中atol, rtol, eps, max_relative_error, 不允许自行放大阈值

OP单测中检查前向输出和反向梯度精度时，存在放大阈值通过单测的问题。为了更好得保证Op质量，提出了本条规范，并在CI中添加了相应的检查方法。

为什么设置numeric_grad_delta = 1e-5 ，max_relative_error = 1e-7

可以看这个规范：OP单测精度升级到float64

整体Op的开发规范和单测规范，正在整理到 API测试和验收规范中 @DDDivano

总结下：为了保证Op精度，原则上不允许在单测中放大阈值来通过单测（即对用户透明）。如果实在过不了，使用max_relative_error后，CI会进行拦截，会有专门的同学进行精度审核。

tiancaishaonvjituizi · 2022-05-13T05:34:41Z

总结下：为了保证Op精度，原则上不允许在单测中放大阈值来通过单测（即对用户透明）。如果实在过不了，使用max_relative_error后，CI会进行拦截，会有专门的同学进行精度审核。

我有一个觉得不妥的地方是它不应该【没有任何提示地】不执行用户的意图，正确的行为应该是这样：如果用户指定了 max_relative_error，但 op 不在那个 list 里，应该以报错或者警告的方式提醒用户 “你不应该修改 max_relative_error，我们会把 max_relative_error 覆盖为 1e-7，如果一定要修改，那请按照下面的步骤 ....”，而不是在不告知用户的情况下直接覆盖。这种“轻率”的行为会引起下游不必要的调试成本，也不是一个工程实践中该被提倡的做法（直接违反了最小惊讶原则）

Ligoml · 2022-05-25T02:44:53Z

python/paddle/tensor/math.py

+        logcumsumexp(x)_{ij} = log \sum_{i=0}^{j}exp(x_{ij})
+
+    Note:
+    The first element of the result is the same of the first element of the input. 


这里预览没有触发Note的样式，建议加一个缩进

好的，已修复

Ligoml · 2022-05-25T02:45:17Z

python/paddle/tensor/math.py

+
+    .. math::
+
+        logcumsumexp(x)_{ij} = log \sum_{i=0}^{j}exp(x_{ij})


为什么英文文档有公式，中文文档没有呢？

已在 PaddlePaddle/docs#4807 修复

Ligoml · 2022-05-26T02:31:09Z

python/paddle/tensor/math.py

@@ -2908,7 +2908,7 @@ def cumsum(x, axis=None, dtype=None, name=None):
    The cumulative sum of the elements along a given axis. 

    **Note**:
-    The first element of the result is the same of the first element of the input. 
+    The first element of the result is the same as the first element of the input. 


我意思是加一个缩进即可，参考：

Note: The first element of the result is the same as the first element of the input.

Note 不需要加粗的~

另外，文档修改可以在commit命名的时候加上 ;test=document_fix ，可以跳过代码检查的CI

已修复，可以再看一下 @Ligoml

luotao1 · 2022-05-27T09:47:15Z

2022-05-27 17:10:12 + git merge --no-edit develop
2022-05-27 17:10:12 fatal: refusing to merge unrelated histories

请merge下最新的develop分支重新提交

tiancaishaonvjituizi · 2022-05-27T10:32:59Z

请merge下最新的develop分支重新提交

好的，已 merge @luotao1

Ligoml

LGTM for docs

luotao1 · 2022-05-30T04:07:41Z

python/paddle/fluid/tests/unittests/test_logcumsumexp_op.py

+        with fluid.program_guard(fluid.Program()):
+
+            with self.assertRaises(TypeError):
+                data_np = np.random.random((100, 100), dtype=np.int32)


单测超时的频率还是很多，我们每个PR内会retry三次，三次都不过才会报失败。如果减少不了单测时间，可以使用Timeout属性来增加单测时间（默认是15S）

Paddle/python/paddle/fluid/tests/unittests/CMakeLists.txt

Line 504 in a1d8777

list(REMOVE_ITEM TEST_OPS test_warpctc_op)

Paddle/python/paddle/fluid/tests/unittests/CMakeLists.txt

Lines 617 to 618 in a1d8777

py_test_modules(test_warpctc_op MODULES test_warpctc_op)

set_tests_properties(test_warpctc_op PROPERTIES TIMEOUT 120)

luotao1 · 2022-06-09T03:16:34Z

请使用precommit修复下static-check流水线中的代码格式问题

tiancaishaonvjituizi · 2022-06-09T09:26:27Z

@luotao1 已修复

Ligoml

LGTM for docs

XieYunshen

LGTM for set_tests_properties(test_logcumsumexp_op PROPERTIES TIMEOUT 30)

tiancaishaonvjituizi · 2022-06-10T10:23:35Z

伟大！

implement logcumsumexp

5c3b6bb

paddle-bot-old bot added contributor External developers status: proposed labels Apr 26, 2022

tiancaishaonvjituizi changed the title ~~implement logcumsumexp~~ [Hackathon No.28] implement logcumsumexp Apr 26, 2022

tiancaishaonvjituizi commented Apr 26, 2022

View reviewed changes

polish

4054f7c

tiancaishaonvjituizi force-pushed the logcumsumexp branch from ef038b7 to 4054f7c Compare April 26, 2022 06:48

tiancaishaonvjituizi commented Apr 26, 2022

View reviewed changes

tiancaishaonvjituizi mentioned this pull request Apr 26, 2022

【PaddlePaddle Hackathon 第二期】任务总览 #40234

Closed

tiancaishaonvjituizi added 4 commits April 30, 2022 16:43

Merge remote-tracking branch 'origin/develop' into logcumsumexp

b8ade29

fix ci

1f98cc7

reformat

518c75a

update

e94f42c

dingjiaweiww assigned luotao1 May 6, 2022

dingjiaweiww added status: open review and removed status: proposed labels May 6, 2022

luotao1 reviewed May 6, 2022

View reviewed changes

tiancaishaonvjituizi added 3 commits May 9, 2022 11:16

address reviews

8c680e6

Merge remote-tracking branch 'origin/develop' into logcumsumexp

442bc00

add OpTest

a3e50da

tiancaishaonvjituizi force-pushed the logcumsumexp branch from 5bb57f6 to a3e50da Compare May 9, 2022 08:21

luotao1 reviewed May 10, 2022

View reviewed changes

paddle/fluid/operators/cum_op.cc Outdated Show resolved Hide resolved

python/paddle/tensor/math.py Outdated Show resolved Hide resolved

refine docs

9797d10

tiancaishaonvjituizi dismissed stale reviews from DDDivano and luotao1 via 9797d10 May 18, 2022 04:03

Ligoml mentioned this pull request May 18, 2022

[Hackathon No.28] add logcumsumexp docs PaddlePaddle/docs#4807

Merged

Ligoml reviewed May 25, 2022

View reviewed changes

update docs

3b4b8fe

tiancaishaonvjituizi force-pushed the logcumsumexp branch from 24ff364 to 3b4b8fe Compare May 25, 2022 13:05

Ligoml reviewed May 26, 2022

View reviewed changes

fix docs;test=document_fix

57bb711

Merge remote-tracking branch 'origin/develop' into logcumsumexp

4601d17

Ligoml previously approved these changes May 30, 2022

View reviewed changes

luotao1 reviewed May 30, 2022

View reviewed changes

tiancaishaonvjituizi added 2 commits June 7, 2022 14:51

Merge remote-tracking branch 'origin/develop' into logcumsumexp

250998c

set test timeout to 30s

6a3647c

tiancaishaonvjituizi dismissed Ligoml’s stale review via 6a3647c June 7, 2022 06:55

Merge remote-tracking branch 'origin/develop' into logcumsumexp

d6c7aa7

reformat

13edc4f

luotao1 approved these changes Jun 10, 2022

View reviewed changes

Ligoml approved these changes Jun 10, 2022

View reviewed changes

XieYunshen approved these changes Jun 10, 2022

View reviewed changes

lanxianghit approved these changes Jun 10, 2022

View reviewed changes

luotao1 merged commit 19a7524 into PaddlePaddle:develop Jun 10, 2022

tiancaishaonvjituizi deleted the logcumsumexp branch June 10, 2022 10:23


		.. math::

		logcumsumexp(x)_{ij} = log \sum_{i=0}^{j}exp(x_{ij})

	py_test_modules(test_warpctc_op MODULES test_warpctc_op)
	set_tests_properties(test_warpctc_op PROPERTIES TIMEOUT 120)

[Hackathon No.28] implement logcumsumexp #42267

[Hackathon No.28] implement logcumsumexp #42267

Conversation

tiancaishaonvjituizi commented Apr 26, 2022

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 26, 2022

tiancaishaonvjituizi Apr 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi Apr 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paddle-bot-old bot commented May 6, 2022

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi commented May 9, 2022

luotao1 commented May 10, 2022

tiancaishaonvjituizi commented May 12, 2022 • edited

luotao1 commented May 12, 2022

tiancaishaonvjituizi commented May 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented May 27, 2022

tiancaishaonvjituizi commented May 27, 2022

Ligoml left a comment

Choose a reason for hiding this comment

luotao1 May 30, 2022 • edited

Choose a reason for hiding this comment

luotao1 commented Jun 9, 2022

tiancaishaonvjituizi commented Jun 9, 2022

Ligoml left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

tiancaishaonvjituizi commented Jun 10, 2022

tiancaishaonvjituizi Apr 26, 2022 •

edited

tiancaishaonvjituizi Apr 26, 2022 •

edited

tiancaishaonvjituizi commented May 12, 2022 •

edited

tiancaishaonvjituizi commented May 13, 2022 •

edited

luotao1 May 30, 2022 •

edited