Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[geometric]Add paddle.geometric.send_ue_recv API #43174

Merged
merged 58 commits into from Aug 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
9312d16
add init file
DesmonDay May 11, 2022
7b0e641
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay May 12, 2022
dd22ac2
add op definition and infermeta
DesmonDay May 12, 2022
a0938cb
add kernel definition funcs
DesmonDay May 12, 2022
1bc283c
add broadcast infer shape
DesmonDay May 17, 2022
c1d51a9
add gpu forward kernel
DesmonDay May 19, 2022
6e19153
delete SUB and DIV
DesmonDay May 19, 2022
c14114b
add x_grad
DesmonDay May 25, 2022
4bf6480
add template
DesmonDay May 25, 2022
94702d4
add e_grad for min and max
DesmonDay May 25, 2022
0d82c54
fix small bug
DesmonDay May 25, 2022
40e3fc4
temp commit
DesmonDay May 25, 2022
c566dcc
temp commit
DesmonDay May 31, 2022
de96e57
add e_grad for sum and mean
DesmonDay Jun 1, 2022
7f6fb72
fix some compile bug
DesmonDay Jun 1, 2022
375be76
fix compile bugs
DesmonDay Jun 1, 2022
9dd7083
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay Jun 2, 2022
0ea8878
fix compile problem
DesmonDay Jun 2, 2022
f1ea92f
add sum forward unittest
DesmonDay Jun 2, 2022
f961f9b
fix broadcast error, add kernel sig, register e_grad, change unit test
DesmonDay Jun 6, 2022
1cbbb4d
fix grad
DesmonDay Jun 7, 2022
48230e2
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay Jun 7, 2022
e9d57fe
add temp grad fix
DesmonDay Jun 9, 2022
be98048
temp commit
DesmonDay Jun 27, 2022
18b5382
add min max unittest
DesmonDay Jul 8, 2022
81014e7
add max, min unittest, fix mul bug
DesmonDay Jul 11, 2022
a02e07a
add cpu forward sum and mean
DesmonDay Jul 12, 2022
bb5c366
add forward min max, fix mean unittest
DesmonDay Jul 12, 2022
6b82a27
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay Jul 12, 2022
fb10fb4
add cpu backward min max
DesmonDay Jul 12, 2022
e59e516
fix code-style
DesmonDay Jul 12, 2022
de7782e
add backward sum mean
DesmonDay Jul 13, 2022
cd10b9e
fix rocm ci
DesmonDay Jul 13, 2022
737da40
set uniitest timeout
DesmonDay Jul 13, 2022
5f6e0b5
fix bug of x broadcast to e, gpu grad
DesmonDay Jul 19, 2022
ba4a65a
fix bug of x broadcast to e, cpu grad
DesmonDay Jul 20, 2022
190695a
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay Jul 20, 2022
575ab03
rename BOOST_GET_CONST macro
DesmonDay Jul 20, 2022
10b5cc7
fix rocm ci
DesmonDay Jul 22, 2022
b6e2c27
mv graph_send_e_recv to graph_send_ue_recv
DesmonDay Jul 25, 2022
4c9b0fb
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay Jul 25, 2022
5f4e958
move out_size to IntArray
DesmonDay Jul 25, 2022
100f853
add eager op test
DesmonDay Jul 25, 2022
8b5aed9
fix max pool type bug, add unittest for api
DesmonDay Jul 26, 2022
8a5057d
revise api doc
DesmonDay Jul 26, 2022
dab0ccc
add fp16 for atomic min and max, add unittest
DesmonDay Jul 29, 2022
9e21001
add unittest
DesmonDay Jul 29, 2022
471b051
add fp16 support for graph_send_recv
DesmonDay Jul 29, 2022
4b6d6ab
fix unittest fp16 bug
DesmonDay Aug 2, 2022
677ea3a
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
DesmonDay Aug 9, 2022
917b029
change OutSizeTensor to Out_size
DesmonDay Aug 9, 2022
bb8517a
move E to Y
DesmonDay Aug 9, 2022
666bd68
add copyright, fix comment
DesmonDay Aug 10, 2022
a480092
review code
DesmonDay Aug 10, 2022
36d1eab
fix thread block size
DesmonDay Aug 11, 2022
50bf7da
fix thread block size
DesmonDay Aug 11, 2022
e7cbc9f
change api attribute name: pool_type to reduce_op, compute_type to me…
DesmonDay Aug 11, 2022
2b0bd9a
change api attribute name, move pool_type to reduce_op, move compute_…
DesmonDay Aug 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 6 additions & 6 deletions paddle/fluid/operators/graph_send_recv_op.cc
Expand Up @@ -64,9 +64,9 @@ class GraphSendRecvOpMaker : public framework::OpProtoAndCheckerMaker {
.AsDispensable();
AddOutput("Out", "Output tensor of graph_send_recv op.");
AddOutput("Dst_count",
"Count tensor of Dst_index, mainly for MEAN pool_type.")
"Count tensor of Dst_index, mainly for MEAN reduce_op.")
.AsIntermediate();
AddAttr<std::string>("pool_type",
AddAttr<std::string>("reduce_op",
"(string, default 'SUM')"
"Define different pool types to receive the result "
"tensors of Dst_index.")
Expand All @@ -81,7 +81,7 @@ class GraphSendRecvOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC(
Graph Learning Send_Recv combine operator.

$Out = Recv(Send(X, Src_index), Dst_index, pool_type)$
$Out = Recv(Send(X, Src_index), Dst_index, reduce_op)$

This operator is mainly used in Graph Learning domain, and the main purpose is to reduce
intermediate memory consumption in the process of message passing.
Expand All @@ -105,12 +105,12 @@ class GraphSendRecvGradOpMaker : public framework::SingleGradOpMaker<T> {
op->SetInput("Dst_index", this->Input("Dst_index"));
op->SetInput("X", this->Input("X"));

if (PADDLE_GET_CONST(std::string, this->GetAttr("pool_type")) == "MEAN") {
if (PADDLE_GET_CONST(std::string, this->GetAttr("reduce_op")) == "MEAN") {
op->SetInput("Dst_count", this->Output("Dst_count"));
}

if (PADDLE_GET_CONST(std::string, this->GetAttr("pool_type")) == "MIN" ||
PADDLE_GET_CONST(std::string, this->GetAttr("pool_type")) == "MAX") {
if (PADDLE_GET_CONST(std::string, this->GetAttr("reduce_op")) == "MIN" ||
PADDLE_GET_CONST(std::string, this->GetAttr("reduce_op")) == "MAX") {
op->SetInput("Out", this->Output("Out"));
}

Expand Down
150 changes: 150 additions & 0 deletions paddle/fluid/operators/graph_send_ue_recv_op.cc
@@ -0,0 +1,150 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "paddle/fluid/framework/infershape_utils.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/phi/core/infermeta_utils.h"
#include "paddle/phi/infermeta/multiary.h"

namespace paddle {
namespace operators {

class GraphSendUERecvOP : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;

protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(
OperatorWithKernel::IndicateVarDataType(ctx, "X"),
ctx.device_context());
}
};

class GraphSendUERecvGradOp : public framework::OperatorWithKernel {
public:
using framework::OperatorWithKernel::OperatorWithKernel;

void InferShape(framework::InferShapeContext* ctx) const override {
auto in_dims = ctx->GetInputDim("X");
ctx->SetOutputDim(framework::GradVarName("X"), in_dims);
auto y_dims = ctx->GetInputDim("Y");
ctx->SetOutputDim(framework::GradVarName("Y"), y_dims);
}

protected:
framework::OpKernelType GetExpectedKernelType(
const framework::ExecutionContext& ctx) const override {
return framework::OpKernelType(OperatorWithKernel::IndicateVarDataType(
ctx, framework::GradVarName("Out")),
ctx.device_context());
}
};

class GraphSendUERecvOpMaker : public framework::OpProtoAndCheckerMaker {
public:
void Make() override {
AddInput("X",
"The input tensor with data type float32, float64, int32, int64.");
AddInput("Y",
"The input edge weight tensor, data type should be same with X");
AddInput("Src_index", "The source index tensor.");
AddInput("Dst_index", "The destination index tensor.");
AddInput("Out_size",
"(Tensor<int>, optional). The 0th dimension of the output."
"It has a higher priority than Attr(out_size).")
.AsDispensable();
AddOutput("Out", "Output tensor of graph_send_ue_recv op.");
AddOutput("Dst_count",
"Count tensor of Dst_index, mainly for MEAN reduce_op.")
.AsIntermediate();
AddAttr<std::string>("message_op",
"(string, default 'ADD')"
"Define differenct computation types between X and E.")
.SetDefault("ADD")
.InEnum({"ADD", "MUL"});
AddAttr<std::string>("reduce_op",
"(string, default 'SUM')"
"Define different pool types to receive the result "
"tensors of Dst_index.")
.SetDefault("SUM")
.InEnum({"SUM", "MEAN", "MIN", "MAX"});
AddAttr<std::vector<int64_t>>(
"out_size",
"(vector<int64_t>, default {0})"
"Define the first dimension of Output tensor."
"If set default {0}, then the shape of Out is the same with X.")
.SetDefault({0});
AddComment(R"DOC(
Graph Learning Send_UE_Recv combine operator.

$Out = Recv(Compute(Send(X, Src_index), Y, message_op), Dst_index, reduce_op)$

This operator is mainly used in Graph Learning domain, and the main purpose is to reduce
intermediate memory consumption in the process of message passing.

Take `X` as the input tensor, we first use `src_index` to gather corresponding data.
Then the gather data should compute with `Y` in different message_ops, like add, sub, mul, and div,
and get the computation result. Then, use `dst_index` to update the corresponding position of output
tensor in different pooling types, like sum, mean, max, or min.

)DOC");
}
};

template <typename T>
class GraphSendUERecvGradOpMaker : public framework::SingleGradOpMaker<T> {
public:
using framework::SingleGradOpMaker<T>::SingleGradOpMaker;

protected:
void Apply(GradOpPtr<T> op) const override {
op->SetType("graph_send_ue_recv_grad");
op->SetInput("X", this->Input("X"));
op->SetInput("Y", this->Input("Y"));
op->SetInput("Src_index", this->Input("Src_index"));
op->SetInput("Dst_index", this->Input("Dst_index"));

if (PADDLE_GET_CONST(std::string, this->GetAttr("reduce_op")) == "MEAN") {
op->SetInput("Dst_count", this->Output("Dst_count"));
}

if (PADDLE_GET_CONST(std::string, this->GetAttr("reduce_op")) == "MIN" ||
PADDLE_GET_CONST(std::string, this->GetAttr("reduce_op")) == "MAX") {
op->SetInput("Out", this->Output("Out"));
}

op->SetInput(framework::GradVarName("Out"), this->OutputGrad("Out"));
op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
op->SetOutput(framework::GradVarName("Y"), this->InputGrad("Y"));
op->SetAttrMap(this->Attrs());
}
};

} // namespace operators
} // namespace paddle

namespace ops = paddle::operators;

DECLARE_INFER_SHAPE_FUNCTOR(graph_send_ue_recv,
GraphSendUERecvInferShapeFunctor,
PD_INFER_META(phi::GraphSendUERecvInferMeta));
REGISTER_OPERATOR(graph_send_ue_recv,
ops::GraphSendUERecvOP,
ops::GraphSendUERecvOpMaker,
ops::GraphSendUERecvGradOpMaker<paddle::framework::OpDesc>,
ops::GraphSendUERecvGradOpMaker<paddle::imperative::OpBase>,
GraphSendUERecvInferShapeFunctor);
REGISTER_OPERATOR(graph_send_ue_recv_grad, ops::GraphSendUERecvGradOp);
98 changes: 98 additions & 0 deletions paddle/fluid/platform/device/gpu/gpu_primitives.h
Expand Up @@ -419,6 +419,55 @@ CUDA_ATOMIC_WRAPPER(Max, double) {
return __longlong_as_double(old);
}

#ifdef PADDLE_CUDA_FP16
inline static __device__ uint32_t max_to_low_half(uint32_t val, float x) {
float16 low_half;
// The float16 in lower 16bits
low_half.x = static_cast<uint16_t>(val & 0xFFFFu);
low_half = static_cast<float16>(max(static_cast<float>(low_half), x));
return (val & 0xFFFF0000u) | low_half.x;
}

inline static __device__ uint32_t max_to_high_half(uint32_t val, float x) {
float16 high_half;
// The float16 in higher 16bits
high_half.x = static_cast<uint16_t>(val >> 16);
high_half = static_cast<float16>(max(static_cast<float>(high_half), x));
return (val & 0xFFFFu) | (static_cast<uint32_t>(high_half.x) << 16);
}

CUDA_ATOMIC_WRAPPER(Max, float16) {
if (*address >= val) {
return *address;
}
uint32_t *address_as_ui = reinterpret_cast<uint32_t *>(
reinterpret_cast<char *>(address) -
(reinterpret_cast<uintptr_t>(address) & 0x02));
DesmonDay marked this conversation as resolved.
Show resolved Hide resolved
float val_f = static_cast<float>(val);
uint32_t old = *address_as_ui;
uint32_t assumed;
if (((uintptr_t)address & 0x02) == 0) {
// The float16 value stay at lower 16 bits of the address.
do {
assumed = old;
old = atomicCAS(address_as_ui, assumed, max_to_low_half(assumed, val_f));
} while (old != assumed);
float16 ret;
ret.x = old & 0xFFFFu;
return ret;
} else {
// The float16 value stay at higher 16 bits of the address.
do {
assumed = old;
old = atomicCAS(address_as_ui, assumed, max_to_high_half(assumed, val_f));
} while (old != assumed);
float16 ret;
ret.x = old >> 16;
return ret;
}
}
#endif

// For atomicMin
USE_CUDA_ATOMIC(Min, int);
USE_CUDA_ATOMIC(Min, unsigned int);
Expand Down Expand Up @@ -503,5 +552,54 @@ CUDA_ATOMIC_WRAPPER(Min, double) {
return __longlong_as_double(old);
}

#ifdef PADDLE_CUDA_FP16
inline static __device__ uint32_t min_to_low_half(uint32_t val, float x) {
float16 low_half;
// The float16 in lower 16bits
low_half.x = static_cast<uint16_t>(val & 0xFFFFu);
low_half = static_cast<float16>(min(static_cast<float>(low_half), x));
return (val & 0xFFFF0000u) | low_half.x;
}

inline static __device__ uint32_t min_to_high_half(uint32_t val, float x) {
float16 high_half;
// The float16 in higher 16bits
high_half.x = static_cast<uint16_t>(val >> 16);
high_half = static_cast<float16>(min(static_cast<float>(high_half), x));
return (val & 0xFFFFu) | (static_cast<uint32_t>(high_half.x) << 16);
}

CUDA_ATOMIC_WRAPPER(Min, float16) {
if (*address <= val) {
return *address;
}
uint32_t *address_as_ui = reinterpret_cast<uint32_t *>(
reinterpret_cast<char *>(address) -
(reinterpret_cast<uintptr_t>(address) & 0x02));
float val_f = static_cast<float>(val);
uint32_t old = *address_as_ui;
uint32_t assumed;
if (((uintptr_t)address & 0x02) == 0) {
// The float16 value stay at lower 16 bits of the address.
do {
assumed = old;
old = atomicCAS(address_as_ui, assumed, min_to_low_half(assumed, val_f));
} while (old != assumed);
float16 ret;
ret.x = old & 0xFFFFu;
return ret;
} else {
// The float16 value stay at higher 16 bits of the address.
do {
assumed = old;
old = atomicCAS(address_as_ui, assumed, min_to_high_half(assumed, val_f));
} while (old != assumed);
float16 ret;
ret.x = old >> 16;
return ret;
}
}
#endif

} // namespace platform
} // namespace paddle
1 change: 1 addition & 0 deletions paddle/fluid/pybind/op_function_generator.h
Expand Up @@ -226,6 +226,7 @@ std::map<std::string, std::set<std::string>> op_ins_map = {
"Mean3",
"Var3"}},
{"graph_send_recv", {"X", "Src_index", "Dst_index", "Out_size"}},
{"graph_send_ue_recv", {"X", "Y", "Src_index", "Dst_index", "Out_size"}},
};

// NOTE(zhiqiu): Like op_ins_map.
Expand Down
13 changes: 12 additions & 1 deletion paddle/phi/api/yaml/legacy_api.yaml
Expand Up @@ -1060,7 +1060,7 @@
func : generate_proposals_v2

- api : graph_send_recv
args : (Tensor x, Tensor src_index, Tensor dst_index, str pool_type = "SUM", IntArray out_size = {0})
args : (Tensor x, Tensor src_index, Tensor dst_index, str reduce_op = "SUM", IntArray out_size = {0})
output : Tensor(out), Tensor(dst_count)
infer_meta :
func : GraphSendRecvInferMeta
Expand All @@ -1070,6 +1070,17 @@
intermediate : dst_count
backward : graph_send_recv_grad

- api : graph_send_ue_recv
args : (Tensor x, Tensor y, Tensor src_index, Tensor dst_index, str message_op, str reduce_op, IntArray out_size)
output : Tensor(out), Tensor(dst_count)
infer_meta :
func : GraphSendUERecvInferMeta
kernel :
func : graph_send_ue_recv
data_type : x
intermediate : dst_count
backward : graph_send_ue_recv_grad

- api : greater_equal
args : (Tensor x, Tensor y, int axis = -1)
output : Tensor
Expand Down
16 changes: 14 additions & 2 deletions paddle/phi/api/yaml/legacy_backward.yaml
Expand Up @@ -941,8 +941,8 @@
func : gelu_grad

- backward_api : graph_send_recv_grad
forward : graph_send_recv (Tensor x, Tensor src_index, Tensor dst_index, str pool_type = "SUM", IntArray out_size = {0}) -> Tensor(out), Tensor(dst_count)
args : (Tensor x, Tensor src_index, Tensor dst_index, Tensor out, Tensor dst_count, Tensor out_grad, str pool_type = "SUM")
forward : graph_send_recv (Tensor x, Tensor src_index, Tensor dst_index, str reduce_op = "SUM", IntArray out_size = {0}) -> Tensor(out), Tensor(dst_count)
args : (Tensor x, Tensor src_index, Tensor dst_index, Tensor out, Tensor dst_count, Tensor out_grad, str reduce_op = "SUM")
output : Tensor(x_grad)
infer_meta :
func : GeneralUnaryGradInferMeta
Expand All @@ -952,6 +952,18 @@
data_type : out_grad
optional: out, dst_count

- backward_api : graph_send_ue_recv_grad
forward : graph_send_ue_recv (Tensor x, Tensor y, Tensor src_index, Tensor dst_index, str message_op, str reduce_op, IntArray out_size) -> Tensor(out), Tensor(dst_count)
args : (Tensor x, Tensor y, Tensor src_index, Tensor dst_index, Tensor out, Tensor dst_count, Tensor out_grad, str message_op, str reduce_op)
output : Tensor(x_grad), Tensor(y_grad)
infer_meta :
func : GeneralBinaryGradInferMeta
param : [x, y]
kernel :
func : graph_send_ue_recv_grad
data_type : out_grad
optional: out, dst_count

# grid sample
- backward_api : grid_sample_grad
forward : grid_sample (Tensor x, Tensor grid, str mode, str padding_mode, bool align_corners) -> Tensor(out)
Expand Down