Implement GELU as function op #5277

pranshupant · 2023-06-01T04:46:16Z

Description

These changes have been made to support the GELU operator as a function op.

Motivation and Context

Support for GELU: Gaussian Error Linear Unit activation function, which was requested in #4933.
#4423 also mentions this under the new ops section of Contributions Welcome.

As per the discussion in #4933, I have added GELU as a context-dependent function-op, that uses the attribute approximate to return one of the two possible function-body definitions.

The first function definition is the regular GELU:
GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))

The second is the fast approximation based on tanh:
GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))

This implementation uses the PyTorch docs for GELU as a reference.

PS: I also refactored onnx/defs/math/defs.cc to bring the operator implementation of mish right next to its doc string.

onnx/backend/test/case/node/gelu.py

onnx/reference/ops/op_gelu.py

onnx/defs/math/defs.cc

onnx/test/automatic_upgrade_test.py

docs/Changelog.md

docs/Operators.md

Signed-off-by: pranshupant <pranshupant@gmail.com>

…matic_upgrade_tests Signed-off-by: pranshupant <pranshupant@gmail.com>

Signed-off-by: pranshupant <pranshupant@gmail.com>

pranshupant · 2023-07-05T03:12:03Z

The Windows CI tests are failing for ORT. Based on the test logs it seems that onnxruntime doesn't have the ops registered for Opset 20 (which is expected). Is there a way to disable these tests for new ops?
Also, I feel the ORT tests should give a warning in case onnxruntime is not installed locally instead of just being skipped.

Signed-off-by: pranshupant <pranshupant@gmail.com>

pranshupant · 2023-07-05T04:06:36Z

I referenced the following PR for a recent op submission #5010 and noticed a similar issue with ORT tests. Subsequently, I added gelu to the backend test exclude list. Please let me know if this was the right thing to do.
All the tests pass now @gramalingam @justinchuby @xadupre

xadupre · 2023-07-05T10:16:35Z

There are some lint issues with C++. You can use clang-format to fix them.

Signed-off-by: pranshupant <pranshupant@gmail.com>

jcwchen

Thank you for updating documents accordingly and now they all pass by CI check! One last thing -- please fix DCO (signoff every commit) and then this PR is ready to go.

.azure-pipelines/Linux-CI.yml

### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

This reverts commit d49d65a.

This reverts commit b4f5f11.

### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

This reverts commit d49d65a. Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

This reverts commit d49d65a. Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

This reverts commit d49d65a. Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>

These changes have been made to support the GELU operator as a function op. Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com>

These changes have been made to support the GELU operator as a function op. Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Aditya Goel <agoel4512@gmail.com>

These changes have been made to support the GELU operator as a function op. Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com>

liqunfu · 2023-09-18T18:39:37Z

@pranshupant, thank you for contributing to ONNX. Now I am preparing ONNX 1.15.0 release and integrating the release with ORT, I wonder if you have a plan to implement the runtime kernel for this new op in ORT? If so, I will let you know once ORT has ONNX 1.15.0 integrated. I understand that Gelu is a function op. Its implementation on ORT is optional. Thank you in advance.

pranshupant · 2023-09-19T14:09:33Z

@pranshupant, thank you for contributing to ONNX. Now I am preparing ONNX 1.15.0 release and integrating the release with ORT, I wonder if you have a plan to implement the runtime kernel for this new op in ORT? If so, I will let you know once ORT has ONNX 1.15.0 integrated. I understand that Gelu is a function op. Its implementation on ORT is optional. Thank you in advance.

@liqunfu I wasn't planning to but I can consider it depending on the effort required. Could you elaborate on that or point me to resources for making such a submission?

liqunfu · 2023-09-19T21:55:42Z

Thank you @pranshupant ! It generally requires add compute code for cpu and other execution providers. Because you have made Gelu a function op, runtime can execute the op without need of a compute kernel for it. I will add an optional work item for the ORT team to see it need to be implemented by the team. Thanks again! Liqun

pranshupant requested review from a team as code owners June 1, 2023 04:46

xadupre reviewed Jun 5, 2023

View reviewed changes

onnx/backend/test/case/node/gelu.py Outdated Show resolved Hide resolved

gramalingam reviewed Jun 6, 2023

View reviewed changes

onnx/backend/test/case/node/gelu.py Outdated Show resolved Hide resolved

pranshupant force-pushed the main branch from 5ed130e to e3a29d1 Compare June 20, 2023 04:37

xadupre reviewed Jun 20, 2023

View reviewed changes

onnx/reference/ops/op_gelu.py Outdated Show resolved Hide resolved