【Hackathon No.60】refactor unary sparse ops and add sparse sqrt, tanh, sin #41356

tiancaishaonvjituizi · 2022-04-02T16:09:30Z

PR types

New features

PR changes

APIs

Describe

Refactor sparse unary ops and add sparse relu for coo and csr tensor

Hackathon issue：#40271
Hackathon RFC：PaddlePaddle/community#90
中文文档：PaddlePaddle/docs#4684

tiancaishaonvjituizi · 2022-04-02T16:16:55Z

paddle/phi/kernels/sparse/utils.h

+  }                                                                      \
+                                                                         \
+  template <typename T, typename Context>                                \
+  void SparseCsr##dense_kernel_func(const Context& dev_ctx,              \


同时实现了 coo 和 csr 版本，python api 名也以 coo 和 csr 区分

tiancaishaonvjituizi · 2022-04-04T09:01:26Z

paddle/phi/kernels/activation_grad_kernel.h

@@ -187,6 +187,7 @@ DECLARE_ACTIVATION_GRAD_KERNEL_DEPX(Log1p);
 DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Relu);
 DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Tanh);
 DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Sigmoid);
+DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Sqrt);


之前 dense tensor 的 SqrtGrad kernel 没有在头文件中声明

tiancaishaonvjituizi · 2022-04-04T09:02:12Z

python/paddle/utils/code_gen/sparse_api.yaml

@@ -7,13 +7,37 @@
  intermediate : rulebook
  backward : conv3d_grad

- api : relu
+- api : coo_relu


原有的 relu 现在也同时实现了 coo 版和 csr 版

tiancaishaonvjituizi · 2022-04-04T09:11:45Z

paddle/phi/kernels/sparse/activation_kernel.cc

-                   double,
-                   phi::dtype::float16) {
+PD_REGISTER_KERNEL(
+    sparse_coo_xx, CPU, ALL_LAYOUT, phi::sparse::SparseCooXX, float, double) {


【需要帮助】

我这里观察到这样的现象：只有 .cc 文件里显式出现 PD_REGISTER_KERNEL 这个字符串，本文件对 kernel 的注册才会真正生效，导致我没办法正常使用我封装的 DEFINE_AND_REGISTER_SPARSE_UNARY_KERNEL 这个宏。目前我通过显式调用 PD_REGISTER_KERNEL 注册一个空的 kernel 来让本文件生效，但这显然不是一个好办法。

另一方面，用字符串匹配而不是语法解析来处理源码，我认为是比较外行的行为，会非常容易遇到意料之外的问题，进而导致只有了解内情的内部人员才知道该怎么写代码，大大加大贡献者提交代码的门槛（例如我这次遇到的问题就是一个例子），不知 paddle 团队是如何考虑的

您的考虑也是合理的，但我们这样做的缘由根本上也是为了减少开发者需要关注的信息：

首先在cc文件中注册kernel发生在import paddle启动时，需要在导入paddle的时候自动完成所有kernel的注册，这个是自然的，不能是手动的；

然后完成这一操作是通过在cc中定义静态对象实例，然后在对象的构造函数中完成kernel注册；

而这个注册用的对象实际上在框架中没有其他地方使用，仅是在程序加载时完成kernel注册的工作，因此需要通过在头文件中声明这个静态对象的存在，如果没有声明的话，编译时会认为它是多余的实现，将其裁剪掉，而这个声明的语句与注册语句是对应的，比如PD_DECLARE_KERNEL(full, CPU, ALL_LAYOUT);

但是我们觉得，这些声明语句也都让开发者来写的话，有些繁琐，不够简洁，因此我们采用了自动生成的方式处理，不让开发者关注这个概念。处理的方式就是匹配cc或cu文件中的注册语句，解析器字段，自动为其生成PD_DECLARE_KERNEL语句，生成为位置在编译完之后build目录下的paddle/phi/kernels/declarations.h，其中放置了所有自动生成的DECLARE语句

这样的问题在各个框架中都存在，其他竞品也有手写声明的，paddle从早期版本到现在一直是采用这样的方式避免开发者额外关注声明语句的。

另外，paddle是一个很复杂的系统，规范和标准化也是很重要的，除了降低开发理解成本的考虑，我们也希望kernel的注册形式是标准化的，统一的，多少会降低一些灵活性，但这样长期演变下去，整体上代码更便于维护和理解，我们是希望限制开发者封装多样的自定义宏这种行为的。

@tiancaishaonvjituizi 所以，这里建议还是把kernel的注册代码写到对应的.cc和.cu中，不建议当前这种通过注册空kernel来实现。

@chenwhql 谢谢回答，有一些疑问：

如果没有声明的话，编译时会认为它是多余的实现，将其裁剪掉

有考虑过使用 --whole-archive 吗，如果担心让库大小变大，可以把所有 kernel 的实现单独作为一个库，只对这个库开 --whole-archive。此外，paddle/phi/kernels/cpu/activation_kernel.cc 中有大量 kernel 是通过 PD_REGISTER_ACTIVATION_KERNEL 这个宏注册的，如 sin、cos、leaky_relu、sqrt 等等，是否意味着这些 kernel 在某些情况下都是无法正常工作的？ 如果是的话，这样严重的问题为什么没有被 ci 所暴露出来呢

因此我们采用了自动生成的方式处理，不让开发者关注这个概念。处理的方式就是匹配cc或cu文件中的注册语句，解析器字段，自动为其生成PD_DECLARE_KERNEL语句

匹配和解析有很多种办法，目前这种在代码文本中匹配固定字符串的做法是很奇怪、很脆弱的，通用的做法是用 libclang 解析 AST。不过我认为最合理的做法是要求 kernel 开发者将 kernel 的信息写在一个 yaml 文件里，用代码生成来生成相关的 boilerplate code

paddle是一个很复杂的系统，规范和标准化也是很重要的，除了降低开发理解成本的考虑，我们也希望kernel的注册形式是标准化的，统一的，多少会降低一些灵活性，但这样长期演变下去，整体上代码更便于维护和理解，我们是希望限制开发者封装多样的自定义宏这种行为的

坦率的讲我认为这个是站不住脚的，这样长期演变下去，代码并不会更便于维护，软件工程里有 “Don't Repeat Yourself” 原则，它讲述的就是如果同样的逻辑重复在多个地方，就会增加后续修改的成本并增加修改后出错的可能性。

@tiancaishaonvjituizi 所以，这里建议还是把kernel的注册代码写到对应的.cc和.cu中，不建议当前这种通过注册空kernel来实现。

好的，这次我会先这样做

@tiancaishaonvjituizi 非常感谢建议哈，目前是这样

我们有计划将所有kernel独立编成一个库，phi后面会独立编译，但目前还没有走到这一步，目前的重点是新形式的kernel能够和原来的执行体系兼容起来，以及还有其他不少高优的工作，感谢建议，后期我们会考虑的

匹配解析后期我们会权衡一下，看有没有更好的方式

kernel的注册信息是开发者经常会查看的，比如时长要确认kernel的属性，有哪些数据类型等，而且CPU，GPU，XPU等设备kernel支持的数据类型经常会不一样。我们开始也希望提供信息很简洁的、统一的注册写法，但开发过程中发现收益并不大，一来大家去查看kernel的时候，是有额外的理解成本的，他要去找这个特殊的宏定义在哪，二来，其实能用的地方很少，比如只能在部分kernel里使用，三，还要考虑推理执行体系infrt的一些需求，现阶段需要写得具体一些。总之这个地方是有权衡和取舍的，目前并不是一个简单的是非问题，希望理解

tiancaishaonvjituizi · 2022-04-04T09:12:55Z

python/paddle/fluid/tests/unittests/test_sparse_activation_op.py

+            sparse_act_out = _C_ops.final_state_sparse_csr_sqrt(sparse_coo_x)
+            correct_result = [2, np.sqrt(2), 4]
+            actual_result = sparse_act_out.non_zero_elements().numpy()
+            assert np.allclose(correct_result, actual_result)


测试已通过

tiancaishaonvjituizi · 2022-04-04T09:14:38Z

paddle/phi/api/lib/api_gen_utils.cc

@@ -144,7 +144,7 @@ phi::TensorBase* SetSparseKernelOutput(Tensor* out, TensorType type) {
          std::make_shared<phi::SparseCsrTensor>(phi::DenseTensor(),
                                                 phi::DenseTensor(),
                                                 phi::DenseTensor(),
-                                                 phi::DDim{-1});
+                                                 phi::DDim{-1, -1});


这里 paddle 原本的代码是错误的，不知道是不是没有测试过。SparseCsrTensor 的构造函数内会检查 dims 的长度，只允许 2d 和 3d，这里 1d 的 dim 会导致 check 失败

当前文件数超了最大限制了，要不把api_gen_utils.cc和sparse_csr_tensor.cc这两个修复的文件单独提一个PR？

python/paddle/fluid/tests/unittests/test_sparse_activation_op.py

paddle-bot-old · 2022-04-14T02:55:41Z

Sorry to inform you that a7f3410's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Signed-off-by: tiancaishaonvjituizi <452565578@qq.com>

tiancaishaonvjituizi · 2022-04-21T07:02:24Z

paddle/phi/kernels/sparse/activation_kernel.cc

+// kernel registration mechanism. Do NOT refactor them unless you
+// know what you are doing.
+// If you want to implement any new kernel, please follow `sqrt` above
+// instead of `relu` following


为了避开 #41356 (comment) 的限制，relu 这个 kernel 用 PD_REGISTER_KERNEL 手动注册而没有用 DEFINE_AND_REGISTER_SPARSE_UNARY_KERNEL 宏。并写了详细的注释

tiancaishaonvjituizi · 2022-04-21T07:11:10Z

python/paddle/utils/code_gen/sparse_bw_api.yaml

-  args : (Tensor x, Tensor out_grad)
+- backward_api : sparse_coo_relu_grad
+  forward : sparse_coo_relu(Tensor x) -> Tensor(out@SparseCooTensor)
+  args : (Tensor out, Tensor out_grad)


原 sparse relu 的后向从捕获 x 变为了捕获 out，这样在常见的神经网络模式（relu->conv）中会大量节省显存。dense relu grad 也是这样的

python/paddle/fluid/tests/unittests/test_sparse_activation_op.py

tiancaishaonvjituizi · 2022-04-21T07:16:29Z

@chenwhql @zkh2016 已按照 review 意见修改，可以再次 review 了，谢谢

python/paddle/sparse/functional/activation.py

paddle/phi/kernels/sparse/activation_kernel.h

zkh2016 · 2022-04-21T08:44:08Z

paddle/phi/kernels/sparse/utils.h

+#include "paddle/phi/kernels/empty_kernel.h"
+
+#define DEFINE_SPARSE_UNARY_KERNEL(dense_kernel_func)                    \
+  namespace phi {                                                        \


需要把命名空间都包含进去吗？

PD_REGISTER_KERNEL 也有必须写在顶层命名空间的要求，这个宏自身包含命名空间会让使用体验更一致一些

tiancaishaonvjituizi · 2022-05-07T01:30:21Z

@chenwhql @zkh2016 可以再 review 了

chenwhql

LGTM for code, 麻烦再check一下CI问题

zkh2016 · 2022-05-07T02:25:28Z

@chenwhql @zkh2016 可以再 review 了

文件名改了后，相关的import路径要改下，可以先本地跑下相关单测。

tiancaishaonvjituizi · 2022-05-09T10:00:59Z

@zkh2016 @chenwhql @TCChenlong @XieYunshen @DDDivano 修了一个格式问题（再次呼唤升级远古时代的 clang-format 3.8），又需要各位再点一次 approve 了

XiaoguangHu01

LG API

refactor unary sparse ops and add relu

97ac270

tiancaishaonvjituizi force-pushed the sparse_relu branch from 6ac7cd6 to 9327121 Compare April 2, 2022 16:14

add test

26f4662

tiancaishaonvjituizi force-pushed the sparse_relu branch from 9327121 to 26f4662 Compare April 2, 2022 16:15

tiancaishaonvjituizi commented Apr 2, 2022

View reviewed changes

fix the bug in generated api code, tests are passed now

a7f3410

tiancaishaonvjituizi force-pushed the sparse_relu branch from 1530a92 to a7f3410 Compare April 4, 2022 09:00

tiancaishaonvjituizi commented Apr 4, 2022

View reviewed changes

tiancaishaonvjituizi mentioned this pull request Apr 4, 2022

【PaddlePaddle Hackathon 第二期】任务总览 #40234

Closed

TCChenlong requested a review from zkh2016 April 12, 2022 11:20

TCChenlong added contributor External developers status: proposed labels Apr 12, 2022

zkh2016 reviewed Apr 13, 2022

View reviewed changes

python/paddle/fluid/tests/unittests/test_sparse_activation_op.py Outdated Show resolved Hide resolved

tiancaishaonvjituizi added 2 commits April 20, 2022 22:33

Merge branch 'develop' into sparse_relu

d4310af

Signed-off-by: tiancaishaonvjituizi <452565578@qq.com>

update relu for new sparse api

7e5f102

update test, implement api, fix sqrt grad

71864fd

tiancaishaonvjituizi force-pushed the sparse_relu branch from 1ea8a6d to 71864fd Compare April 21, 2022 05:40

tiancaishaonvjituizi added 3 commits April 21, 2022 14:23

manually register relu and relu_grad kernel to bypass the restriction

a99a5ba

polish sqrt docs

95aa0b3

reformat

f706dea

tiancaishaonvjituizi commented Apr 21, 2022

View reviewed changes

tiancaishaonvjituizi added 2 commits April 21, 2022 15:19

polish docs

d898df7

remove csr backward api

b770f41

zkh2016 reviewed Apr 21, 2022

View reviewed changes

python/paddle/sparse/functional/activation.py Outdated Show resolved Hide resolved

zkh2016 reviewed Apr 21, 2022

View reviewed changes

paddle/phi/kernels/sparse/activation_kernel.h Outdated Show resolved Hide resolved

zkh2016 reviewed Apr 21, 2022

View reviewed changes

tiancaishaonvjituizi dismissed stale reviews from TCChenlong, XieYunshen, and zkh2016 via a6d2cd0 May 6, 2022 07:31

tiancaishaonvjituizi added 2 commits May 6, 2022 17:05

remove unused files

67d14b4

rename python files

268ac34

chenwhql previously approved these changes May 7, 2022

View reviewed changes

fix import path

39c9750

tiancaishaonvjituizi dismissed chenwhql’s stale review via 39c9750 May 7, 2022 06:29

zkh2016 previously approved these changes May 9, 2022

View reviewed changes

chenwhql previously approved these changes May 9, 2022

View reviewed changes

TCChenlong previously approved these changes May 9, 2022

View reviewed changes

XieYunshen previously approved these changes May 9, 2022

View reviewed changes

DDDivano previously approved these changes May 9, 2022

View reviewed changes

reformat

06787c0

tiancaishaonvjituizi dismissed stale reviews from DDDivano, XieYunshen, TCChenlong, chenwhql, and zkh2016 via 06787c0 May 9, 2022 09:58

zkh2016 approved these changes May 10, 2022

View reviewed changes

TCChenlong approved these changes May 11, 2022

View reviewed changes

XieYunshen approved these changes May 11, 2022

View reviewed changes

chenwhql approved these changes May 11, 2022

View reviewed changes

DDDivano approved these changes May 11, 2022

View reviewed changes

XiaoguangHu01 approved these changes May 12, 2022

View reviewed changes

zkh2016 merged commit f1eda7d into PaddlePaddle:develop May 12, 2022

tiancaishaonvjituizi mentioned this pull request May 17, 2022

【Hackathon No.19】Implement ASGD optimizer #42431

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon No.60】refactor unary sparse ops and add sparse sqrt, tanh, sin #41356

【Hackathon No.60】refactor unary sparse ops and add sparse sqrt, tanh, sin #41356

tiancaishaonvjituizi commented Apr 2, 2022 •

edited

tiancaishaonvjituizi Apr 2, 2022

tiancaishaonvjituizi Apr 4, 2022

tiancaishaonvjituizi Apr 4, 2022

tiancaishaonvjituizi Apr 4, 2022

chenwhql Apr 13, 2022

zkh2016 Apr 13, 2022

tiancaishaonvjituizi Apr 20, 2022 •

edited

tiancaishaonvjituizi Apr 20, 2022

chenwhql Apr 21, 2022 •

edited

tiancaishaonvjituizi Apr 4, 2022

tiancaishaonvjituizi Apr 4, 2022 •

edited

zkh2016 Apr 24, 2022

tiancaishaonvjituizi Apr 26, 2022

paddle-bot-old bot commented Apr 14, 2022

tiancaishaonvjituizi Apr 21, 2022

tiancaishaonvjituizi Apr 21, 2022

tiancaishaonvjituizi commented Apr 21, 2022

zkh2016 Apr 21, 2022

tiancaishaonvjituizi Apr 21, 2022

tiancaishaonvjituizi commented May 7, 2022

chenwhql left a comment •

edited

zkh2016 commented May 7, 2022

tiancaishaonvjituizi commented May 9, 2022

XiaoguangHu01 left a comment

【Hackathon No.60】refactor unary sparse ops and add sparse sqrt, tanh, sin #41356

【Hackathon No.60】refactor unary sparse ops and add sparse sqrt, tanh, sin #41356

Conversation

tiancaishaonvjituizi commented Apr 2, 2022 • edited

PR types

PR changes

Describe

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi Apr 20, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql Apr 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi Apr 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paddle-bot-old bot commented Apr 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi commented Apr 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaishaonvjituizi commented May 7, 2022

chenwhql left a comment • edited

Choose a reason for hiding this comment

zkh2016 commented May 7, 2022

tiancaishaonvjituizi commented May 9, 2022

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

tiancaishaonvjituizi commented Apr 2, 2022 •

edited

tiancaishaonvjituizi Apr 20, 2022 •

edited

chenwhql Apr 21, 2022 •

edited

tiancaishaonvjituizi Apr 4, 2022 •

edited

chenwhql left a comment •

edited