Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 2】3、为 Paddle 新增 corrcoef(皮尔逊积矩相关系数) API #40690

Merged
merged 57 commits into from May 9, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
001a5f1
corrcoef commit
liqitong-a Mar 17, 2022
45a53eb
corrcoef commit
liqitong-a Mar 17, 2022
d2aa0fb
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 22, 2022
bb5c04d
Update test_corr.py
liqitong-a Mar 22, 2022
6d88d9a
Update linalg.py
liqitong-a Mar 22, 2022
2c1cfe8
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 25, 2022
1cf83e8
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 25, 2022
2d654fd
Update test_corr.py
liqitong-a Mar 25, 2022
e5b66e9
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 25, 2022
6efaca6
Update test_corr.py
liqitong-a Mar 25, 2022
53ee671
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 26, 2022
9a74568
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 26, 2022
fe98fcd
Update test_corr.py
liqitong-a Mar 26, 2022
f39ad07
Update test_corr.py
liqitong-a Mar 26, 2022
4ee155e
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 26, 2022
794e368
Update test_corr.py
liqitong-a Mar 26, 2022
02c9ef9
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 27, 2022
af6b514
Update test_corr.py
liqitong-a Mar 27, 2022
f90d599
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 27, 2022
3d8e0b0
Update test_corr.py
liqitong-a Mar 27, 2022
1991f92
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 28, 2022
760b858
Update test_corr.py
liqitong-a Mar 28, 2022
694b895
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 28, 2022
510c6e7
Update test_corr.py
liqitong-a Mar 28, 2022
74969a4
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 28, 2022
677ba6f
Update test_corr.py
liqitong-a Mar 28, 2022
6b8e3d3
Update linalg.py
liqitong-a Apr 7, 2022
87ba181
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 7, 2022
49227fb
Update linalg.py
liqitong-a Apr 7, 2022
84c65a7
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 7, 2022
a80e7a5
Update linalg.py
liqitong-a Apr 8, 2022
d63e2dc
Update test_corr.py
liqitong-a Apr 12, 2022
3c1dd13
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 12, 2022
9b66604
Update test_corr.py
liqitong-a Apr 12, 2022
46b9021
Update test_corr.py
liqitong-a Apr 13, 2022
299d4c0
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 13, 2022
937a4fe
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 13, 2022
f631db9
Update test_corr.py
liqitong-a Apr 14, 2022
dde2566
Update test_corr.py
liqitong-a Apr 14, 2022
f59b0ef
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 15, 2022
7eddd43
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 15, 2022
67be88e
Update test_corr.py
liqitong-a Apr 18, 2022
607eb71
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 18, 2022
552c07b
Update test_corr.py
liqitong-a Apr 18, 2022
ae572c4
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 18, 2022
3652bbd
Update test_corr.py
liqitong-a Apr 18, 2022
18339f5
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 18, 2022
decb986
Update test_corr.py
liqitong-a Apr 24, 2022
189a29f
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 24, 2022
69064d2
Update test_corr.py
liqitong-a Apr 27, 2022
7c3b09d
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 27, 2022
7c5efcf
Update test_corr.py
liqitong-a Apr 27, 2022
e97d91c
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 27, 2022
4fca073
Update test_corr.py
liqitong-a Apr 29, 2022
51da5d6
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 29, 2022
2255acf
Update test_corr.py
liqitong-a May 5, 2022
9cbac57
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a May 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
134 changes: 134 additions & 0 deletions python/paddle/fluid/tests/unittests/test_corr.py
@@ -0,0 +1,134 @@
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019->2022

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import paddle.fluid as fluid
import unittest
import numpy as np
import six
import paddle
import warnings


def numpy_corr(np_arr, rowvar=True, ddof=0):
return np.corrcoef(np_arr, rowvar=rowvar, ddof=int(ddof))


class Corr_Test(unittest.TestCase):
def setUp(self):
self.shape = [20, 10]

def test_tensor_corr_default(self):
typelist = ['float64']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果支持复数,请添加复数类型测试

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果支持复数,请添加复数类型测试

我想问一下,numpy里的cov支持复数,paddle里的cov不支持复数,corrcoef是再cov的基础上写的,paddle的corrcoef需要支持复数吗,如果需要的话,那cov是需要改成支持复数的吗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我去看了一下,基础操作现在复数支持还不完善,那可以先不添加复数测试了。

Copy link

@shjNT shjNT Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typelist是否需要补充fp32,

places = [fluid.CPUPlace()]
if fluid.core.is_compiled_with_cuda():
places.append(fluid.CUDAPlace(0))
for idx, p in enumerate(places):
if idx == 0:
paddle.set_device('cpu')
else:
paddle.set_device('gpu')

for dtype in typelist:
np_arr = np.random.rand(*self.shape).astype(dtype)
tensor = paddle.to_tensor(np_arr, place=p)
corr = paddle.linalg.corrcoef(tensor, ddof=False)
np_corr = numpy_corr(np_arr, rowvar=True, ddof=0)
self.assertTrue(np.allclose(np_corr, corr.numpy()))

def test_tensor_corr_rowvar(self):
typelist = ['float64']
places = [fluid.CPUPlace()]
if fluid.core.is_compiled_with_cuda():
places.append(fluid.CUDAPlace(0))

for idx, p in enumerate(places):
if idx == 0:
paddle.set_device('cpu')
else:
paddle.set_device('gpu')

for dtype in typelist:
np_arr = np.random.rand(*self.shape).astype(dtype)
tensor = paddle.to_tensor(np_arr, place=p)
corr = paddle.linalg.corrcoef(tensor, rowvar=False, ddof=False)
np_corr = numpy_corr(np_arr, rowvar=False, ddof=0)
self.assertTrue(np.allclose(np_corr, corr.numpy()))

def test_tensor_corr_ddof(self):
typelist = ['float64']
places = [fluid.CPUPlace()]
if fluid.core.is_compiled_with_cuda():
places.append(fluid.CUDAPlace(0))

for idx, p in enumerate(places):
if idx == 0:
paddle.set_device('cpu')
else:
paddle.set_device('gpu')

for dtype in typelist:
np_arr = np.random.rand(*self.shape).astype(dtype)
tensor = paddle.to_tensor(np_arr, place=p)
corr = paddle.linalg.corrcoef(tensor, ddof=True)
np_corr = numpy_corr(np_arr, rowvar=True, ddof=1)
self.assertTrue(np.allclose(np_corr, corr.numpy()))


class Corr_Test2(Corr_Test):
def setUp(self):
self.shape = [10]


# Input(x) only support N-D (1<=N<=2) tensor
class Corr_Test3(unittest.TestCase):
def setUp(self):
self.shape = [2, 5, 10]

def test_errors(self):
def test_err():
np_arr = np.random.rand(*self.shape).astype('float64')
tensor = paddle.to_tensor(np_arr)
covrr = paddle.linalg.corrcoef(tensor, ddof=False)

self.assertRaises(ValueError, test_err)


class Corr_Test4(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个test类添加一下注释,
增加不支持的数据类型测试案例

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个test类添加一下注释, 增加不支持的数据类型测试案例

我改好啦,麻烦看一下哦

def setUp(self):
self.shape = [2, 2, 5, 10]

def test_errors(self):
def test_err():
np_arr = np.random.rand(*self.shape).astype('float64')
tensor = paddle.to_tensor(np_arr)
corr = paddle.linalg.corrcoef(tensor, ddof=False)

self.assertRaises(ValueError, test_err)


class Corr_Test5(unittest.TestCase):
def setUp(self):
self.shape = [2, 5, 10, 6, 7]

def test_errors(self):
def test_err():
np_arr = np.random.rand(*self.shape).astype('float64')
tensor = paddle.to_tensor(np_arr)
corr = paddle.linalg.corrcoef(tensor, ddof=False)

self.assertRaises(ValueError, test_err)


if __name__ == '__main__':
unittest.main()
2 changes: 2 additions & 0 deletions python/paddle/linalg.py
Expand Up @@ -16,6 +16,7 @@
from .tensor.linalg import norm # noqa: F401
from .tensor.linalg import eig # noqa: F401
from .tensor.linalg import cov # noqa: F401
from .tensor.linalg import corrcoef # noqa: F401
from .tensor.linalg import cond # noqa: F401
from .tensor.linalg import matrix_power # noqa: F401
from .tensor.linalg import solve # noqa: F401
Expand All @@ -41,6 +42,7 @@
'norm',
'cond',
'cov',
'corrcoef',
'inv',
'eig',
'eigvals',
Expand Down
2 changes: 2 additions & 0 deletions python/paddle/tensor/__init__.py
Expand Up @@ -40,6 +40,7 @@
from .linalg import matmul # noqa: F401
from .linalg import dot # noqa: F401
from .linalg import cov # noqa: F401
from .linalg import corrcoef # noqa: F401
from .linalg import norm # noqa: F401
from .linalg import cond # noqa: F401
from .linalg import transpose # noqa: F401
Expand Down Expand Up @@ -275,6 +276,7 @@
'matmul',
'dot',
'cov',
'corrcoef',
'norm',
'cond',
'transpose',
Expand Down
69 changes: 69 additions & 0 deletions python/paddle/tensor/linalg.py
Expand Up @@ -21,6 +21,7 @@
from ..fluid.layers import transpose, cast # noqa: F401
from ..fluid import layers
import paddle
import warnings
from paddle.common_ops_import import core
from paddle.common_ops_import import VarDesc
from paddle import _C_ops
Expand Down Expand Up @@ -2990,3 +2991,71 @@ def lstsq(x, y, rcond=None, driver=None, name=None):
singular_values = paddle.static.data(name='singular_values', shape=[0])

return solution, residuals, rank, singular_values


def corrcoef(x, rowvar=True, ddof=False, name=None):
"""
Return Pearson product-moment correlation coefficients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档不要照抄numpy吧,自己尝试写一下能不能更清楚一些


Please refer to the documentation for `cov` for more detail. The
relationship between the correlation coefficient matrix, `R`, and the
covariance matrix, `C`, is

.. math:: R_{ij} = \\frac{ C_{ij} } { \\sqrt{ C_{ii} * C_{jj} } }

The values of `R` are between -1 and 1, inclusive.

Parameters:

x(Tensor): A N-D(N<=2) Tensor containing multiple variables and observations. By default, each row of x represents a variable. Also see rowvar below.
rowvar(Bool, optional): If rowvar is True (default), then each row represents a variable, with observations in the columns. Default: True
ddof(Bool, optional): Has no effect, do not use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果已经deprecated,那为什么还要保留?

name(str, optional): Name of the output. Default is None. It's used to print debug info for developers. Details: :ref:`api_guide_Name`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中英文文档参数数目不一致,需要保证严格一致哦,包括参数描述也自己对比一下吧~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

加一下标点符号


Returns:

Tensor: The correlation coefficient matrix of the variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns 不需要写 Tensor:


Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

这里报错了,再检查一下


.. code-block:: python

import paddle

xt = paddle.rand((3,4))
paddle.linalg.corrcoef(xt)

'''
Tensor(shape=[3, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
[[ 1. , -0.73702252, 0.66228950],
[-0.73702258, 1. , -0.77104872],
[ 0.66228974, -0.77104825, 1. ]])
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 自己跑一下示例代码能不能跑通,不要因为例程有问题影响到CI
  • 输出结果的展示参考其他API文档,使用 # 的方式注释掉


"""

if ddof is not False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

warnings.warn('ddof have no effect and are deprecated',
DeprecationWarning)
c = cov(x, rowvar)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码中最好分类处理,不要用try,否则有其他错误也不容易发现

d = paddle.diag(c)
except ValueError:
# scalar covariance
# nan if incorrect value (nan, inf, 0), 1 otherwise
return c / c

if paddle.is_complex(d):
d = d.real()
stddev = paddle.sqrt(d)
c /= stddev[:, None]
c /= stddev[None, :]

# Clip to [-1, 1]. This does not guarantee
if paddle.is_complex(c):
return paddle.complex(
paddle.clip(c.real(), -1, 1), paddle.clip(c.imag(), -1, 1))
else:
c = paddle.clip(c, -1, 1)

return c