New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GELU as function op #5277
Conversation
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
…matic_upgrade_tests Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
The Windows CI tests are failing for ORT. Based on the test logs it seems that onnxruntime doesn't have the ops registered for Opset 20 (which is expected). Is there a way to disable these tests for new ops? |
Signed-off-by: pranshupant <pranshupant@gmail.com>
I referenced the following PR for a recent op submission #5010 and noticed a similar issue with ORT tests. Subsequently, I added |
There are some lint issues with C++. You can use clang-format to fix them. |
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating documents accordingly and now they all pass by CI check! One last thing -- please fix DCO (signoff every commit) and then this PR is ready to go.
### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
This reverts commit d49d65a.
This reverts commit b4f5f11.
### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
This reverts commit d49d65a. Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
This reverts commit d49d65a. Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
This reverts commit d49d65a. Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
### Description These changes have been made to support the GELU operator as a function op. ### Motivation and Context Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. onnx#4423 also mentions this under the new ops section of `Contributions Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
These changes have been made to support the GELU operator as a function op. Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com>
These changes have been made to support the GELU operator as a function op. Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Aditya Goel <agoel4512@gmail.com>
These changes have been made to support the GELU operator as a function op. Support for [GELU: Gaussian Error Linear Unit](https://paperswithcode.com/method/gelu) activation function, which was requested in onnx#4933. Welcome`. As per the discussion in onnx#4933, I have added GELU as a context-dependent function-op, that uses the attribute `approximate` to return one of the two possible function-body definitions. The first function definition is the regular GELU: `GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))` The second is the fast approximation based on `tanh`: `GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))` This implementation uses the [PyTorch docs for GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU) as a reference. PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator implementation of `mish` right next to its doc string. --------- Signed-off-by: pranshupant <pranshupant@gmail.com> Co-authored-by: G. Ramalingam <grama@microsoft.com>
@pranshupant, thank you for contributing to ONNX. Now I am preparing ONNX 1.15.0 release and integrating the release with ORT, I wonder if you have a plan to implement the runtime kernel for this new op in ORT? If so, I will let you know once ORT has ONNX 1.15.0 integrated. I understand that Gelu is a function op. Its implementation on ORT is optional. Thank you in advance. |
@liqunfu I wasn't planning to but I can consider it depending on the effort required. Could you elaborate on that or point me to resources for making such a submission? |
Thank you @pranshupant ! It generally requires add compute code for cpu and other execution providers. Because you have made Gelu a function op, runtime can execute the op without need of a compute kernel for it. I will add an optional work item for the ORT team to see it need to be implemented by the team. Thanks again! Liqun |
Description
These changes have been made to support the GELU operator as a function op.
Motivation and Context
Support for GELU: Gaussian Error Linear Unit activation function, which was requested in #4933.
#4423 also mentions this under the new ops section of
Contributions Welcome
.As per the discussion in #4933, I have added GELU as a context-dependent function-op, that uses the attribute
approximate
to return one of the two possible function-body definitions.The first function definition is the regular GELU:
GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))
The second is the fast approximation based on
tanh
:GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))
This implementation uses the PyTorch docs for GELU as a reference.
PS: I also refactored
onnx/defs/math/defs.cc
to bring the operator implementation ofmish
right next to its doc string.