Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GELU as function op #5277

Merged
merged 26 commits into from Jul 10, 2023
Merged

Implement GELU as function op #5277

merged 26 commits into from Jul 10, 2023

Conversation

pranshupant
Copy link
Contributor

Description

These changes have been made to support the GELU operator as a function op.

Motivation and Context

Support for GELU: Gaussian Error Linear Unit activation function, which was requested in #4933.
#4423 also mentions this under the new ops section of Contributions Welcome.

As per the discussion in #4933, I have added GELU as a context-dependent function-op, that uses the attribute approximate to return one of the two possible function-body definitions.

The first function definition is the regular GELU:
GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))

The second is the fast approximation based on tanh:
GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))

This implementation uses the PyTorch docs for GELU as a reference.

PS: I also refactored onnx/defs/math/defs.cc to bring the operator implementation of mish right next to its doc string.

@pranshupant pranshupant requested review from a team as code owners June 1, 2023 04:46
onnx/defs/math/defs.cc Outdated Show resolved Hide resolved
onnx/defs/math/defs.cc Outdated Show resolved Hide resolved
docs/Changelog.md Outdated Show resolved Hide resolved
docs/Operators.md Outdated Show resolved Hide resolved
docs/Operators.md Outdated Show resolved Hide resolved
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
…matic_upgrade_tests

Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
@pranshupant
Copy link
Contributor Author

The Windows CI tests are failing for ORT. Based on the test logs it seems that onnxruntime doesn't have the ops registered for Opset 20 (which is expected). Is there a way to disable these tests for new ops?
Also, I feel the ORT tests should give a warning in case onnxruntime is not installed locally instead of just being skipped.

Signed-off-by: pranshupant <pranshupant@gmail.com>
@pranshupant
Copy link
Contributor Author

I referenced the following PR for a recent op submission #5010 and noticed a similar issue with ORT tests. Subsequently, I added gelu to the backend test exclude list. Please let me know if this was the right thing to do.
All the tests pass now @gramalingam @justinchuby @xadupre

@xadupre xadupre enabled auto-merge (squash) July 5, 2023 09:02
@xadupre
Copy link
Contributor

xadupre commented Jul 5, 2023

There are some lint issues with C++. You can use clang-format to fix them.

Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Signed-off-by: pranshupant <pranshupant@gmail.com>
Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating documents accordingly and now they all pass by CI check! One last thing -- please fix DCO (signoff every commit) and then this PR is ready to go.

@gramalingam gramalingam added this pull request to the merge queue Jul 10, 2023
Merged via the queue into onnx:main with commit c0033eb Jul 10, 2023
37 checks passed
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
### Description
These changes have been made to support the GELU operator as a function
op.

### Motivation and Context
Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
onnx#4423 also mentions this under the new ops section of `Contributions
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
### Description
These changes have been made to support the GELU operator as a function
op.

### Motivation and Context
Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
onnx#4423 also mentions this under the new ops section of `Contributions
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
This reverts commit d49d65a.

Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
### Description
These changes have been made to support the GELU operator as a function
op.

### Motivation and Context
Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
onnx#4423 also mentions this under the new ops section of `Contributions
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
### Description
These changes have been made to support the GELU operator as a function
op.

### Motivation and Context
Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
onnx#4423 also mentions this under the new ops section of `Contributions
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
This reverts commit d49d65a.

Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
### Description
These changes have been made to support the GELU operator as a function
op.

### Motivation and Context
Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
onnx#4423 also mentions this under the new ops section of `Contributions
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
This reverts commit d49d65a.

Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
hamptonm1 pushed a commit to hamptonm1/onnx that referenced this pull request Jul 10, 2023
### Description
These changes have been made to support the GELU operator as a function
op.

### Motivation and Context
Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
onnx#4423 also mentions this under the new ops section of `Contributions
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Megan Hampton <hamptonm@us.ibm.com>
adityagoel4512 pushed a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
These changes have been made to support the GELU operator as a function
op.

Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
adityagoel4512 pushed a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
These changes have been made to support the GELU operator as a function
op.

Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
adityagoel4512 pushed a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
These changes have been made to support the GELU operator as a function
op.

Support for [GELU: Gaussian Error Linear
Unit](https://paperswithcode.com/method/gelu) activation function, which
was requested in onnx#4933.
Welcome`.

As per the discussion in onnx#4933, I have added GELU as a context-dependent
function-op, that uses the attribute `approximate` to return one of the
two possible function-body definitions.

The first function definition is the regular GELU:
`GELU(x)=x∗Φ(x) = 0.5 * x * (1 + erf(x / sqrt(2)))`

The second is the fast approximation based on `tanh`:
`GELU(x)=0.5 ∗ x ∗ (1+Tanh( sqrt(2/π) ∗ (x + 0.044715 ∗ x^3)))`

This implementation uses the [PyTorch docs for
GELU](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html?highlight=gelu#torch.nn.GELU)
as a reference.

PS: I also refactored `onnx/defs/math/defs.cc` to bring the operator
implementation of `mish` right next to its doc string.

---------

Signed-off-by: pranshupant <pranshupant@gmail.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
@liqunfu
Copy link
Contributor

liqunfu commented Sep 18, 2023

@pranshupant, thank you for contributing to ONNX. Now I am preparing ONNX 1.15.0 release and integrating the release with ORT, I wonder if you have a plan to implement the runtime kernel for this new op in ORT? If so, I will let you know once ORT has ONNX 1.15.0 integrated. I understand that Gelu is a function op. Its implementation on ORT is optional. Thank you in advance.

@pranshupant
Copy link
Contributor Author

@pranshupant, thank you for contributing to ONNX. Now I am preparing ONNX 1.15.0 release and integrating the release with ORT, I wonder if you have a plan to implement the runtime kernel for this new op in ORT? If so, I will let you know once ORT has ONNX 1.15.0 integrated. I understand that Gelu is a function op. Its implementation on ORT is optional. Thank you in advance.

@liqunfu I wasn't planning to but I can consider it depending on the effort required. Could you elaborate on that or point me to resources for making such a submission?

@liqunfu
Copy link
Contributor

liqunfu commented Sep 19, 2023

Thank you @pranshupant ! It generally requires add compute code for cpu and other execution providers. Because you have made Gelu a function op, runtime can execute the op without need of a compute kernel for it. I will add an optional work item for the ORT team to see it need to be implemented by the team. Thanks again! Liqun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator Issues related to ONNX operators
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants