【PaddlePaddle Hackathon 2】15 新增 API Nanmedian #42385

thunder95 · 2022-04-28T15:02:40Z

PR types

New features

PR changes

APIs

Describe

完成第二期第15项目开发任务: #40318
paddle.nanmedian 扩展了 paddle.median API 的功能，如果输入Tensor中有nan值， paddle.median 会将nan值考虑在内然后取中位数(有可能为nan值)，而 paddle.nanmedian 只考虑非nan值的中位数。

RFC设计文档: PaddlePaddle/community#89
中文文档: PaddlePaddle/docs#4820

… nanmedian

paddle-bot-old · 2022-04-28T15:02:43Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… nanmedian

TCChenlong · 2022-05-01T01:49:30Z

根据提示，应该是API的示例代码错误，可以检查一下

thunder95 · 2022-05-01T14:21:40Z

根据提示，应该是API的示例代码错误，可以检查一下

@TCChenlong 多谢龙哥，已经修改好，现在只剩approved error了

paddle-bot-old · 2022-05-05T02:27:59Z

PR格式检查通过，你的PR将接受Paddle专家以及开源社区的review，请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

jeff41404 · 2022-05-05T03:00:58Z

python/paddle/tensor/stat.py

@@ -253,6 +253,140 @@ def numel(x, name=None):
    return out


+def nanmedian(x, axis=None, ignore_nan=True, keepdim=True, name=None):


there is no parameter of ignore_nan in nanmedian RFC. if we want to add this parameter, should change RFC first.

This RFC doc was updated with a new PR: PaddlePaddle/community#124

I discussed this case with some paddle experts today.
In Paddle v2 3(coming this month) C++ API will be officially introduced, a program (such as model Networking) can be composed by basic C++ APIs instead of Python APIs, which is very useful for some performance pursuit scenarios.
In order to reduce the threshold for users to use C++ API, it is necessary to ensure that the experience of using C++ API is the same with Python, that is, the parameters of the two should be the same (the different ones are being modified in plan), and the documents of Python API can be reused without writing another document about C++ API.
Therefore, in this case, the RFC doc does not need to be modified, and the parameter ignore_nan should be moved to a kernel function (e.g. BaseMedianLernel), and then both nanmediankernel and mediankernel call this function using different value of ignore_nan. So the parameters of Python API, C++ API and kernel are the same, and the paddle Median and padding Nanmedian is used in the same way.

Your proposal is quite reasonable. I will move the logic into c++ implementation. I will adopt this structure, BaseMedianKernel(with parameters axis, keepdim), and then calculate the presence of nan values, and futher determine which kernel to pick (nanmediankernel and mediankernel). This param ignore_nan wont be used any more. If I understood incorrectly, pls let me know, thanks!

jeff41404 · 2022-05-05T03:02:09Z

python/paddle/tensor/stat.py

+            If ``axis`` is less than 0, it works the same way as :math:`axis + D`.
+            If ``axis`` is None, median is calculated over all elements of ``x``. Default is None.
+        ignore_nan (bool, optional): Whether to ignore nan values when median was calculated.
+            If `ignore_nan` is True, the calculation process is the same as `median` operator.


If ignore_nan is False, the calculation process is the same as median operator?

please add link of RFC(API design documents) in description

already added.

… nanmedian

jeff41404 · 2022-05-05T09:00:46Z

python/paddle/tensor/stat.py

+    res_shape = [x for x in out_shape if x > 0]
+
+    if _in_legacy_dygraph():
+        medians, out = _C_ops.nanmedian(x, 'ignore_nan', ignore_nan)


According to the comments above, the parameters of Python API and C++ API should be the same, and the preprocessing logic in Python above should also be put into C++.

A question: can c++ api support the python data type like tuple and None?

jeff41404 · 2022-05-05T09:04:13Z

paddle/phi/kernels/cpu/nanmedian_kernel.cc

+
+    int64_t pos = (stride - num_nan - 1) / 2;
+    std::nth_element(col_vec.begin(),
+                     col_vec.begin() + pos,


use col_vec.begin() + pos + 1 is better? no need to use std::nth_element again in if statement below?

Learned from https://en.cppreference.com/w/cpp/algorithm/nth_element, the order of elements before the nth element seems to be somehow unreliable, so I fetched the n-1th element again. If the index of elements are needed, this std:nth_element would not be the good choice.

jeff41404

LGTM

TCChenlong · 2022-05-20T02:15:31Z

请补充中文文档，并将链接附在 Describe 中

… nanmedian

…nanmedian

thunder95 · 2022-05-21T00:43:22Z

请补充中文文档，并将链接附在 Describe 中

@TCChenlong 完成

… nanmedian

Ligoml · 2022-05-25T02:40:24Z

python/paddle/tensor/stat.py

@@ -241,6 +241,103 @@ def numel(x, name=None):
    return out


+def nanmedian(x, axis=None, keepdim=True, name=None):
+    """
+    Compute the median along the specified axis, while ignoring NaNs.


在文档前加r ，现在无法正常解析英文文档

@Ligoml 已加

… nanmedian

…nanmedian

Ligoml · 2022-05-26T04:43:23Z

python/paddle/__init__.py

@@ -331,6 +331,7 @@
 from .tensor.stat import var  # noqa: F401
 from .tensor.stat import numel  # noqa: F401
 from .tensor.stat import median  # noqa: F401
+from .tensor.stat import nanmedian  # noqa: F401


这个api需要加在 __all__ 中，否则不会被正常展示在api列表中

@Ligoml 已修改

… nanmedian

Ligoml

LGTM for docs

chenwhql · 2022-05-26T06:54:55Z

paddle/fluid/operators/nanmedian_op.cc

@@ -0,0 +1,121 @@
+/*Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


license换行缩进有点问题，建议参考别的改一下

老师您好，已修改

chenwhql · 2022-05-26T06:56:11Z

paddle/phi/infermeta/unary.cc

+  out->set_dims(make_ddim(out_dim));
+}
+
+void NanmedianGradInferMeta(const MetaTensor& x,


反向的麻烦放到backward.h/cc中

老师您好，已修改

chenwhql · 2022-05-26T06:58:37Z

paddle/phi/kernels/gpu/nanmedian_kernel.cu

+  bool should_ignore_nan = ignore_nan;
+  auto stream = dev_ctx.stream();
+  auto* ctx =
+      reinterpret_cast<const paddle::platform::CUDADeviceContext*>(&dev_ctx);


这里好像不需要转换？直接使用dev_ctx

老师您好，已修改

chenwhql · 2022-05-26T06:59:52Z

paddle/phi/kernels/nanmedian_kernel.h

+  for (int i = 0; i < ndims; i++) {
+    trans_dim[i] = input_dim[perm[i]];
+  }
+  x->mutable_data<T>(trans_dim, dev_ctx.GetPlace());


麻烦换成dev_ctx.template Alloc()？

老师您好，已修改 @chenwhql

… nanmedian

chenwhql

LGTM

zhwesky2010

LGTM for change parallel_UT_rule

XiaoguangHu01

LGTM

* nanmedian op * 修改cuda kernel的bug * 修复count_if在其他硬件平台不兼容 * 修复某些cpu硬件不兼容 * 修复某些cpu硬件不兼容 * 修复isnan判断 * 兼容numpy低版本不支持全部nan的情况 * 兼容numpy低版本不支持全部nan的情况 * fix code example * fix api comment error * 修改反向传播逻辑以及c++处理逻辑 * 完成修改建议 * typo pre_dim * update en docs, test=document_fix * remove numpy in en doc, test=document_fix * add r,test=document_fix * 添加api到all * follow advice from chenwhql

thunder95 added 2 commits April 28, 2022 14:55

nanmedian op

9fe3ee7

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

00f901c

… nanmedian

paddle-bot-old bot added contributor External developers status: proposed labels Apr 28, 2022

thunder95 mentioned this pull request Apr 28, 2022

【PaddlePaddle Hackathon 第二期】任务总览 #40234

Closed

thunder95 changed the title ~~Nanmedian~~ 【PaddlePaddle Hackathon 2】15 新增 API Nanmedian Apr 28, 2022

thunder95 added 10 commits April 29, 2022 05:13

修改cuda kernel的bug

7eae9c2

修复count_if在其他硬件平台不兼容

1f2a6e6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7fb02ab

… nanmedian

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1cce8df

… nanmedian

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

adaa2a1

… nanmedian

修复某些cpu硬件不兼容

d5c35d8

修复某些cpu硬件不兼容

4ec331b

修复isnan判断

24424a7

兼容numpy低版本不支持全部nan的情况

a0e6c3c

兼容numpy低版本不支持全部nan的情况

0dac2bd

fix code example

2a944f6

dingjiaweiww added status: open review and removed status: proposed labels May 5, 2022

dingjiaweiww assigned jeff41404 May 5, 2022

jeff41404 reviewed May 5, 2022

View reviewed changes

thunder95 added 2 commits May 5, 2022 05:12

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d7bdc21

… nanmedian

fix api comment error

06af183

jeff41404 reviewed May 5, 2022

View reviewed changes

jeff41404 previously approved these changes May 16, 2022

View reviewed changes

thunder95 added 3 commits May 20, 2022 17:10

update en docs, test=document_fix

8c158b5

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b0c9471

… nanmedian

Merge branch 'nanmedian' of https://github.com/thunder95/Paddle into …

5f14183

…nanmedian

thunder95 dismissed jeff41404’s stale review via 5f14183 May 20, 2022 17:14

thunder95 mentioned this pull request May 21, 2022

【PaddlePaddle Hackathon 2】15、doc for Nanmedian PaddlePaddle/docs#4820

Merged

thunder95 added 2 commits May 23, 2022 03:06

remove numpy in en doc, test=document_fix

46dc918

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

21e131e

… nanmedian

Ligoml reviewed May 25, 2022

View reviewed changes

thunder95 added 3 commits May 25, 2022 03:35

add r,test=document_fix

117e102

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

dc6b654

… nanmedian

Merge branch 'nanmedian' of https://github.com/thunder95/Paddle into …

4e8cbc1

…nanmedian

Ligoml reviewed May 26, 2022

View reviewed changes

thunder95 added 2 commits May 26, 2022 05:41

添加api到all

a3b23f6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

be473ea

… nanmedian

Ligoml reviewed May 26, 2022

View reviewed changes

chenwhql reviewed May 26, 2022

View reviewed changes

thunder95 added 2 commits May 26, 2022 11:21

follow advice from chenwhql

6744d0a

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8021de0

… nanmedian

chenwhql approved these changes May 27, 2022

View reviewed changes

zhwesky2010 approved these changes May 30, 2022

View reviewed changes

Ligoml approved these changes May 30, 2022

View reviewed changes

jeff41404 approved these changes May 30, 2022

View reviewed changes

XiaoguangHu01 approved these changes May 30, 2022

View reviewed changes

jeff41404 merged commit f87fa3c into PaddlePaddle:develop May 30, 2022

jeff41404 mentioned this pull request Jun 24, 2022

【Hackathon No.6】implement nan_to_num #42469

Merged

		@@ -253,6 +253,140 @@ def numel(x, name=None):
		return out


		def nanmedian(x, axis=None, ignore_nan=True, keepdim=True, name=None):

		@@ -0,0 +1,121 @@
		/*Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

【PaddlePaddle Hackathon 2】15 新增 API Nanmedian #42385

【PaddlePaddle Hackathon 2】15 新增 API Nanmedian #42385

Conversation

thunder95 commented Apr 28, 2022 • edited

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 28, 2022

TCChenlong commented May 1, 2022

thunder95 commented May 1, 2022

paddle-bot-old bot commented May 5, 2022

jeff41404 May 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeff41404 May 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

TCChenlong commented May 20, 2022

thunder95 commented May 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ligoml May 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ligoml left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

zhwesky2010 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

thunder95 commented Apr 28, 2022 •

edited

jeff41404 May 5, 2022 •

edited

jeff41404 May 5, 2022 •

edited

Ligoml May 26, 2022 •

edited