Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsedatetime op #5417

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

cbourjau
Copy link
Contributor

Proposal implementation of the operator described in #5409

@@ -148,6 +148,7 @@
from onnx.reference.ops.op_optional_has_element import OptionalHasElement
from onnx.reference.ops.op_or import Or
from onnx.reference.ops.op_pad import Pad_1, Pad_2, Pad_11, Pad_18
from onnx.reference.ops.op_parsedatetime import ParseDateTime

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'ParseDateTime' is not used.
onnx/backend/test/case/node/parsedatetime.py Fixed Show fixed Hide fixed
See onnx#5309 for details.

Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
@cbourjau cbourjau marked this pull request as ready for review July 14, 2023 17:19
@cbourjau cbourjau requested a review from a team as a code owner July 14, 2023 17:19
import numpy as np

import onnx
from onnx import numpy_helper

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'numpy_helper' is not used.
@cbourjau cbourjau marked this pull request as draft July 25, 2023 11:39
expect(node, inputs=[x], outputs=[np.array(y)], name="test_parsedatetime")

@staticmethod
def export_int_default() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test with a 2D matrix? A tensor will null dimensions?

@xadupre
Copy link
Contributor

xadupre commented Jul 25, 2023

Should we add the operator converting an int/double into a string?

@cbourjau
Copy link
Contributor Author

The PR is currently slightly out of sync with the API proposed in the issue (#5409 ). Do you have any comments on the general design there?



class ParseDateTime(OpRun):
def _run(self, x, format, unit, default=None): # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format, unit, default are operatros attribute, they should be added as format=None, unit=None, default=None to distinguish them from the inputs. Class OpRun replaces them by the value stored in the node definition.

AttributeProto::STRING)
.Attr(
"default",
"Default value to be used if the parsing fails. The tensor must be of rank 0 and either of type `tensor(int64)` or `tensor(double)`. The tensor type is the output type. If 'default' is specified, the output type is `tensor(int64)` and the behavior for failing to parse an input element is implementation defined.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 'default' is not specified...

.Input(0, "X", "Tensor with datetime strings", "T1", OpSchema::Single, true, 1, OpSchema::NonDifferentiable)
.Output(0, "y", "Unix time stamps", "T2", OpSchema::Single, true, 1, OpSchema::NonDifferentiable)
.Attr("format", "Format description in the syntax of C's `strptime`.", AttributeProto::STRING)
.Attr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this unit optional as well and choose one for the default.

AttributeProto::STRING)
.Attr(
"default",
"Default value to be used if the parsing fails. The tensor must be of rank 0 and either of type `tensor(int64)` or `tensor(double)`. The tensor type is the output type. If 'default' is specified, the output type is `tensor(int64)` and the behavior for failing to parse an input element is implementation defined.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any combination (unit, default) not possible? The documentation says what the default value type is but not the default value.

@xadupre
Copy link
Contributor

xadupre commented Jul 27, 2023

It looks good to me. cc @gramalingam

@gramalingam
Copy link
Contributor

It looks good to me. cc @gramalingam

@xadupre and @cbourjau : I added a bunch of comments to the issue #5409 , FYI.

A short summary (of the comments there) is:

  • Is there any value/interest in sticking close to strptime, and returning a tuple of values (year, month, day, ...) by adding a dimension to the input shape? It seems to me that could be better in terms of feature-engineering (keeping the features separate, instead of combining them into one number).
  • Do we really need to support ns as a unit? strptime has only microseconds.
  • The python documentation of strptime refers to a C 89 standard, and some extensions beyond that. Would be useful to clarify what we want.

OPTIONAL_VALUE)

.TypeConstraint("T1", {"tensor(string)"}, "UTF-8 datetime strings")
.TypeConstraint("T2", {"tensor(double)", "tensor(int64)"}, "Output type depends on 'default' attribute.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support double here? In principle, users can encode it as Where( Equal(value, default), NaN, Cast(value, float)). Specifically, if we are not worried about nano-seconds (which strptime doesn't seem to support anyway), is the dynamic range of int64 not sufficient (sticking to microseconds)?

@gramalingam gramalingam added the operator Issues related to ONNX operators label Aug 30, 2023
@cbourjau
Copy link
Contributor Author

Thanks for the comments thus far! I have kept the related issue up-to-date. We are currently test-driving an operator of this kind as a custom operator. I will update this PR once we have gathered more experience.

@luizlf
Copy link

luizlf commented Feb 28, 2024

Any updates on this PR?

@cbourjau
Copy link
Contributor Author

cbourjau commented Mar 4, 2024

We have been using datetime parsing via custom operators in the onnxruntime for a few months now. We use essentially the implementation found in this example, except that we only allow the directives mentioned in the related issue.

We have technically not gone live with a model using datetime parsing, yet, but on a preliminary basis, the outlined feature set appears to serve real-world use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator Issues related to ONNX operators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants