Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon No.60】refactor unary sparse ops and add sparse sqrt, tanh, sin #41356

Merged
merged 36 commits into from May 12, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
97ac270
refactor unary sparse ops and add relu
tiancaishaonvjituizi Apr 2, 2022
26f4662
add test
tiancaishaonvjituizi Apr 2, 2022
a7f3410
fix the bug in generated api code, tests are passed now
tiancaishaonvjituizi Apr 4, 2022
d4310af
Merge branch 'develop' into sparse_relu
tiancaishaonvjituizi Apr 20, 2022
7e5f102
update relu for new sparse api
tiancaishaonvjituizi Apr 20, 2022
71864fd
update test, implement api, fix sqrt grad
tiancaishaonvjituizi Apr 21, 2022
a99a5ba
manually register relu and relu_grad kernel to bypass the restriction
tiancaishaonvjituizi Apr 21, 2022
95aa0b3
polish sqrt docs
tiancaishaonvjituizi Apr 21, 2022
f706dea
reformat
tiancaishaonvjituizi Apr 21, 2022
d898df7
polish docs
tiancaishaonvjituizi Apr 21, 2022
b770f41
remove csr backward api
tiancaishaonvjituizi Apr 21, 2022
f92e8cd
fix test compile error
tiancaishaonvjituizi Apr 21, 2022
394ce5e
use allclose instead of array_equal
tiancaishaonvjituizi Apr 21, 2022
c577f46
move sqrt to math_kernel.cc, implement sin and tanh
tiancaishaonvjituizi Apr 21, 2022
3ad6fba
polish header file
tiancaishaonvjituizi Apr 21, 2022
56fc5da
reformat
tiancaishaonvjituizi Apr 21, 2022
1f18c59
refine
tiancaishaonvjituizi Apr 21, 2022
c606825
fix typo
tiancaishaonvjituizi Apr 22, 2022
5dd4507
fix typo
tiancaishaonvjituizi Apr 22, 2022
f59fa26
add test about error, reformat
tiancaishaonvjituizi Apr 23, 2022
dea61c7
fix test error
tiancaishaonvjituizi Apr 23, 2022
60c7359
fix format
tiancaishaonvjituizi Apr 23, 2022
ad8ceda
fix false positive warning in gcc>=9
tiancaishaonvjituizi Apr 26, 2022
178dd27
use more aggressive way
tiancaishaonvjituizi Apr 26, 2022
1ace46f
Merge remote-tracking branch 'origin/develop' into sparse_relu
tiancaishaonvjituizi Apr 26, 2022
7bb41d7
add api in paddle.sparse namespace
tiancaishaonvjituizi Apr 26, 2022
790cb0d
Merge remote-tracking branch 'tiancaishaonv/variant_fix_gcc9_fp_warni…
tiancaishaonvjituizi Apr 26, 2022
c44ac74
address reviews
tiancaishaonvjituizi Apr 27, 2022
d35e923
Merge remote-tracking branch 'origin/develop' into sparse_relu
tiancaishaonvjituizi Apr 27, 2022
b04ab6c
fix ci error
tiancaishaonvjituizi Apr 29, 2022
fa93d7d
rename to unary_kernel, update name
tiancaishaonvjituizi May 6, 2022
a6d2cd0
Merge remote-tracking branch 'origin/develop' into sparse_relu
tiancaishaonvjituizi May 6, 2022
67d14b4
remove unused files
tiancaishaonvjituizi May 6, 2022
268ac34
rename python files
tiancaishaonvjituizi May 6, 2022
39c9750
fix import path
tiancaishaonvjituizi May 7, 2022
06787c0
reformat
tiancaishaonvjituizi May 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion paddle/phi/api/lib/api_gen_utils.cc
Expand Up @@ -144,7 +144,7 @@ phi::TensorBase* SetSparseKernelOutput(Tensor* out, TensorType type) {
std::make_shared<phi::SparseCsrTensor>(phi::DenseTensor(),
phi::DenseTensor(),
phi::DenseTensor(),
phi::DDim{-1});
phi::DDim{-1, -1});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 paddle 原本的代码是错误的,不知道是不是没有测试过。SparseCsrTensor 的构造函数内会检查 dims 的长度,只允许 2d 和 3d,这里 1d 的 dim 会导致 check 失败

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前文件数超了最大限制了,要不把api_gen_utils.cc和sparse_csr_tensor.cc这两个修复的文件单独提一个PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的~

out->set_impl(sparse_tensor);
return sparse_tensor.get();
} else {
Expand Down
8 changes: 5 additions & 3 deletions paddle/phi/core/sparse_csr_tensor.cc
Expand Up @@ -27,9 +27,11 @@ SparseCsrTensor::SparseCsrTensor() {
inline void check_shape(const DDim& dims) {
bool valid = dims.size() == 2 || dims.size() == 3;

PADDLE_ENFORCE(valid,
phi::errors::InvalidArgument(
"the SparseCsrTensor only support 2-D Tensor."));
PADDLE_ENFORCE(
valid,
phi::errors::InvalidArgument("the SparseCsrTensor only support 2-D or "
"3-D Tensor, but get %d-D Tensor",
dims.size()));
}
#define Check(non_zero_crows, non_zero_cols, non_zero_elements, dims) \
{ \
Expand Down
1 change: 1 addition & 0 deletions paddle/phi/kernels/activation_grad_kernel.h
Expand Up @@ -187,6 +187,7 @@ DECLARE_ACTIVATION_GRAD_KERNEL_DEPX(Log1p);
DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Relu);
DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Tanh);
DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Sigmoid);
DECLARE_ACTIVATION_GRAD_KERNEL_DEPOUT(Sqrt);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前 dense tensor 的 SqrtGrad kernel 没有在头文件中声明


DECLARE_ACTIVATION_GRAD_KERNEL_NODEP(Round);
DECLARE_ACTIVATION_GRAD_KERNEL_NODEP(Floor);
Expand Down
51 changes: 9 additions & 42 deletions paddle/phi/kernels/sparse/activation_grad_kernel.cc
Expand Up @@ -13,58 +13,25 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/phi/kernels/sparse/activation_grad_kernel.h"

#include "paddle/phi/kernels/activation_grad_kernel.h"
#include "paddle/phi/kernels/copy_kernel.h"
#include "paddle/phi/kernels/empty_kernel.h"
#include "paddle/phi/kernels/sparse/utils.h"

#include "paddle/phi/backends/cpu/cpu_context.h"
#include "paddle/phi/backends/gpu/gpu_context.h"
#include "paddle/phi/core/kernel_registry.h"
DEFINE_AND_REGISTER_SPARSE_UNARY_GRAD_KERNEL(relu_grad, ReluGradKernel)
DEFINE_AND_REGISTER_SPARSE_UNARY_GRAD_KERNEL(sqrt_grad, SqrtGradKernel)

namespace phi {
namespace sparse {

template <typename T, typename Context>
void SparseReluGradKernel(const Context& dev_ctx,
const SparseCooTensor& x,
const SparseCooTensor& out_grad,
SparseCooTensor* x_grad) {
DenseTensor non_zero_indices =
phi::EmptyLike<T, Context>(dev_ctx, x.non_zero_indices());
DenseTensor non_zero_elements =
phi::EmptyLike<T, Context>(dev_ctx, x.non_zero_elements());
phi::Copy(dev_ctx,
x.non_zero_indices(),
dev_ctx.GetPlace(),
false,
&non_zero_indices);
phi::ReluGradKernel<T, Context>(dev_ctx,
x.non_zero_elements(),
out_grad.non_zero_elements(),
&non_zero_elements);
x_grad->SetMember(non_zero_indices, non_zero_elements, x.dims(), true);
void SparseCooXXGrad(const Context& dev_ctx,
const SparseCooTensor& x,
SparseCooTensor* out) {
}

} // namespace sparse
} // namespace phi

PD_REGISTER_KERNEL(sparse_relu_grad,
CPU,
ALL_LAYOUT,
phi::sparse::SparseReluGradKernel,
float,
double) {
kernel->InputAt(0).SetDataLayout(phi::DataLayout::SPARSE_COO);
}

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
PD_REGISTER_KERNEL(sparse_relu_grad,
GPU,
ALL_LAYOUT,
phi::sparse::SparseReluGradKernel,
float,
double,
phi::dtype::float16) {
PD_REGISTER_KERNEL(
sparse_coo_xx_grad, CPU, ALL_LAYOUT, phi::sparse::SparseCooXXGrad, float, double) {
kernel->InputAt(0).SetDataLayout(phi::DataLayout::SPARSE_COO);
}
#endif
23 changes: 18 additions & 5 deletions paddle/phi/kernels/sparse/activation_grad_kernel.h
Expand Up @@ -15,15 +15,28 @@ limitations under the License. */
#pragma once

#include "paddle/phi/core/sparse_coo_tensor.h"
#include "paddle/phi/core/sparse_csr_tensor.h"

namespace phi {
namespace sparse {

template <typename T, typename Context>
void SparseReluGradKernel(const Context& dev_ctx,
const SparseCooTensor& x,
const SparseCooTensor& out_grad,
SparseCooTensor* x_grad);
#define DECLARE_SPARSE_ACTIVATION_GRAD_KERNEL(name) \
template <typename T, typename Context> \
void SparseCoo##name##GradKernel(const Context& dev_ctx, \
const SparseCooTensor& x, \
const SparseCooTensor& out_grad, \
SparseCooTensor* x_grad); \
\
template <typename T, typename Context> \
void SparseCsr##name##GradKernel(const Context& dev_ctx, \
const SparseCsrTensor& x, \
const SparseCsrTensor& out_grad, \
SparseCsrTensor* x_grad);

DECLARE_SPARSE_ACTIVATION_GRAD_KERNEL(Relu)
DECLARE_SPARSE_ACTIVATION_GRAD_KERNEL(Sqrt)

#undef DECLARE_SPARSE_ACTIVATION_GRAD_KERNEL

} // namespace sparse
} // namespace phi
48 changes: 9 additions & 39 deletions paddle/phi/kernels/sparse/activation_kernel.cc
Expand Up @@ -13,54 +13,24 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/phi/kernels/sparse/activation_kernel.h"
#include "paddle/phi/kernels/copy_kernel.h"
#include "paddle/phi/kernels/empty_kernel.h"

#include "paddle/phi/backends/cpu/cpu_context.h"
#include "paddle/phi/backends/gpu/gpu_context.h"
#include "paddle/phi/core/kernel_registry.h"
#include "paddle/phi/kernels/sparse/utils.h"

DEFINE_AND_REGISTER_SPARSE_UNARY_KERNEL(relu, ReluKernel)
DEFINE_AND_REGISTER_SPARSE_UNARY_KERNEL(sqrt, SqrtKernel)

namespace phi {
namespace sparse {

template <typename T, typename Context>
void SparseReluKernel(const Context& dev_ctx,
const SparseCooTensor& x,
SparseCooTensor* out) {
DenseTensor non_zero_indices =
phi::EmptyLike<T, Context>(dev_ctx, x.non_zero_indices());
DenseTensor non_zero_elements =
phi::EmptyLike<T, Context>(dev_ctx, x.non_zero_elements());
phi::Copy(dev_ctx,
x.non_zero_indices(),
dev_ctx.GetPlace(),
false,
&non_zero_indices);
phi::ReluKernel<T, Context>(
dev_ctx, x.non_zero_elements(), &non_zero_elements);
out->SetMember(non_zero_indices, non_zero_elements, x.dims(), true);
void SparseCooXX(const Context& dev_ctx,
const SparseCooTensor& x,
SparseCooTensor* out) {
}

} // namespace sparse
} // namespace phi

PD_REGISTER_KERNEL(sparse_relu,
CPU,
ALL_LAYOUT,
phi::sparse::SparseReluKernel,
float,
double) {
kernel->InputAt(0).SetDataLayout(phi::DataLayout::SPARSE_COO);
}

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
PD_REGISTER_KERNEL(sparse_relu,
GPU,
ALL_LAYOUT,
phi::sparse::SparseReluKernel,
float,
double,
phi::dtype::float16) {
PD_REGISTER_KERNEL(
sparse_coo_xx, CPU, ALL_LAYOUT, phi::sparse::SparseCooXX, float, double) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

【需要帮助】

我这里观察到这样的现象:只有 .cc 文件里显式出现 PD_REGISTER_KERNEL 这个字符串,本文件对 kernel 的注册才会真正生效,导致我没办法正常使用我封装的 DEFINE_AND_REGISTER_SPARSE_UNARY_KERNEL 这个宏。目前我通过显式调用 PD_REGISTER_KERNEL 注册一个空的 kernel 来让本文件生效,但这显然不是一个好办法。

另一方面,用字符串匹配而不是语法解析来处理源码,我认为是比较外行的行为,会非常容易遇到意料之外的问题,进而导致只有了解内情的内部人员才知道该怎么写代码,大大加大贡献者提交代码的门槛(例如我这次遇到的问题就是一个例子),不知 paddle 团队是如何考虑的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

您的考虑也是合理的,但我们这样做的缘由根本上也是为了减少开发者需要关注的信息:

  1. 首先在cc文件中注册kernel发生在import paddle启动时,需要在导入paddle的时候自动完成所有kernel的注册,这个是自然的,不能是手动的;
  2. 然后完成这一操作是通过在cc中定义静态对象实例,然后在对象的构造函数中完成kernel注册;
  3. 而这个注册用的对象实际上在框架中没有其他地方使用,仅是在程序加载时完成kernel注册的工作,因此需要通过在头文件中声明这个静态对象的存在,如果没有声明的话,编译时会认为它是多余的实现,将其裁剪掉,而这个声明的语句与注册语句是对应的,比如PD_DECLARE_KERNEL(full, CPU, ALL_LAYOUT);
  4. 但是我们觉得,这些声明语句也都让开发者来写的话,有些繁琐,不够简洁,因此我们采用了自动生成的方式处理,不让开发者关注这个概念。处理的方式就是匹配cc或cu文件中的注册语句,解析器字段,自动为其生成PD_DECLARE_KERNEL语句,生成为位置在编译完之后build目录下的paddle/phi/kernels/declarations.h,其中放置了所有自动生成的DECLARE语句

image

这样的问题在各个框架中都存在,其他竞品也有手写声明的,paddle从早期版本到现在一直是采用这样的方式避免开发者额外关注声明语句的。

另外,paddle是一个很复杂的系统,规范和标准化也是很重要的,除了降低开发理解成本的考虑,我们也希望kernel的注册形式是标准化的,统一的,多少会降低一些灵活性,但这样长期演变下去,整体上代码更便于维护和理解,我们是希望限制开发者封装多样的自定义宏这种行为的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaishaonvjituizi 所以,这里建议还是把kernel的注册代码写到对应的.cc.cu中,不建议当前这种通过注册空kernel来实现。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenwhql 谢谢回答,有一些疑问:

如果没有声明的话,编译时会认为它是多余的实现,将其裁剪掉

有考虑过使用 --whole-archive 吗,如果担心让库大小变大,可以把所有 kernel 的实现单独作为一个库,只对这个库开 --whole-archive。此外,paddle/phi/kernels/cpu/activation_kernel.cc 中有大量 kernel 是通过 PD_REGISTER_ACTIVATION_KERNEL 这个宏注册的,如 sin、cos、leaky_relu、sqrt 等等,是否意味着这些 kernel 在某些情况下都是无法正常工作的? 如果是的话,这样严重的问题为什么没有被 ci 所暴露出来呢

因此我们采用了自动生成的方式处理,不让开发者关注这个概念。处理的方式就是匹配cc或cu文件中的注册语句,解析器字段,自动为其生成PD_DECLARE_KERNEL语句

匹配和解析有很多种办法,目前这种在代码文本中匹配固定字符串的做法是很奇怪、很脆弱的,通用的做法是用 libclang 解析 AST。不过我认为最合理的做法是要求 kernel 开发者将 kernel 的信息写在一个 yaml 文件里,用代码生成来生成相关的 boilerplate code

paddle是一个很复杂的系统,规范和标准化也是很重要的,除了降低开发理解成本的考虑,我们也希望kernel的注册形式是标准化的,统一的,多少会降低一些灵活性,但这样长期演变下去,整体上代码更便于维护和理解,我们是希望限制开发者封装多样的自定义宏这种行为的

坦率的讲我认为这个是站不住脚的,这样长期演变下去,代码并不会更便于维护,软件工程里有 “Don't Repeat Yourself” 原则,它讲述的就是如果同样的逻辑重复在多个地方,就会增加后续修改的成本并增加修改后出错的可能性。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaishaonvjituizi 所以,这里建议还是把kernel的注册代码写到对应的.cc和.cu中,不建议当前这种通过注册空kernel来实现。

好的,这次我会先这样做

Copy link
Contributor

@chenwhql chenwhql Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaishaonvjituizi 非常感谢建议哈,目前是这样

  1. 我们有计划将所有kernel独立编成一个库,phi后面会独立编译,但目前还没有走到这一步,目前的重点是新形式的kernel能够和原来的执行体系兼容起来,以及还有其他不少高优的工作,感谢建议,后期我们会考虑的
  2. 匹配解析后期我们会权衡一下,看有没有更好的方式
  3. kernel的注册信息是开发者经常会查看的,比如时长要确认kernel的属性,有哪些数据类型等,而且CPU,GPU,XPU等设备kernel支持的数据类型经常会不一样。我们开始也希望提供信息很简洁的、统一的注册写法,但开发过程中发现收益并不大,一来大家去查看kernel的时候,是有额外的理解成本的,他要去找这个特殊的宏定义在哪,二来,其实能用的地方很少,比如只能在部分kernel里使用,三,还要考虑推理执行体系infrt的一些需求,现阶段需要写得具体一些。总之这个地方是有权衡和取舍的,目前并不是一个简单的是非问题,希望理解

kernel->InputAt(0).SetDataLayout(phi::DataLayout::SPARSE_COO);
}
#endif
28 changes: 23 additions & 5 deletions paddle/phi/kernels/sparse/activation_kernel.h
Expand Up @@ -16,22 +16,40 @@ limitations under the License. */

#include "paddle/phi/core/dense_tensor.h"
#include "paddle/phi/core/sparse_coo_tensor.h"
#include "paddle/phi/core/sparse_csr_tensor.h"
#include "paddle/phi/kernels/activation_kernel.h"
#include "paddle/phi/kernels/empty_kernel.h"

namespace phi {
namespace sparse {

template <typename T, typename Context>
void SparseReluKernel(const Context& dev_ctx,
const SparseCooTensor& x,
SparseCooTensor* out);
#define DECLARE_SPARSE_ACTIVATION_KERNEL(name) \
template <typename T, typename Context> \
void SparseCoo##name##Kernel( \
const Context& dev_ctx, const SparseCooTensor& x, SparseCooTensor* out); \
\
template <typename T, typename Context> \
void SparseCsr##name##Kernel( \
const Context& dev_ctx, const SparseCsrTensor& x, SparseCsrTensor* out);

DECLARE_SPARSE_ACTIVATION_KERNEL(Relu)
DECLARE_SPARSE_ACTIVATION_KERNEL(Sqrt)

#undef DECLARE_SPARSE_ACTIVATION_KERNEL

template <typename T, typename Context>
SparseCooTensor SparseRelu(const Context& dev_ctx, const SparseCooTensor& x) {
DenseTensor indices, values;
SparseCooTensor coo(indices, values, x.dims());
SparseReluKernel<T, Context>(dev_ctx, x, &coo);
SparseCooReluKernel<T, Context>(dev_ctx, x, &coo);
return coo;
}

template <typename T, typename Context>
SparseCooTensor SparseSqrt(const Context& dev_ctx, const SparseCooTensor& x) {
tiancaishaonvjituizi marked this conversation as resolved.
Show resolved Hide resolved
DenseTensor indices, values;
SparseCooTensor coo(indices, values, x.dims());
SparseCooSqrtKernel<T, Context>(dev_ctx, x, &coo);
return coo;
}

Expand Down