New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parsedatetime op #5417
base: main
Are you sure you want to change the base?
Add parsedatetime op #5417
Conversation
@@ -148,6 +148,7 @@ | |||
from onnx.reference.ops.op_optional_has_element import OptionalHasElement | |||
from onnx.reference.ops.op_or import Or | |||
from onnx.reference.ops.op_pad import Pad_1, Pad_2, Pad_11, Pad_18 | |||
from onnx.reference.ops.op_parsedatetime import ParseDateTime |
Check notice
Code scanning / CodeQL
Unused import Note
fc034e6
to
6be9a22
Compare
See onnx#5309 for details. Signed-off-by: Christian Bourjau <christian.bourjau@quantco.com>
6be9a22
to
a5139cc
Compare
expect(node, inputs=[x], outputs=[np.array(y)], name="test_parsedatetime") | ||
|
||
@staticmethod | ||
def export_int_default() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test with a 2D matrix? A tensor will null dimensions?
Should we add the operator converting an int/double into a string? |
The PR is currently slightly out of sync with the API proposed in the issue (#5409 ). Do you have any comments on the general design there? |
|
||
|
||
class ParseDateTime(OpRun): | ||
def _run(self, x, format, unit, default=None): # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format, unit, default are operatros attribute, they should be added as format=None, unit=None, default=None to distinguish them from the inputs. Class OpRun replaces them by the value stored in the node definition.
AttributeProto::STRING) | ||
.Attr( | ||
"default", | ||
"Default value to be used if the parsing fails. The tensor must be of rank 0 and either of type `tensor(int64)` or `tensor(double)`. The tensor type is the output type. If 'default' is specified, the output type is `tensor(int64)` and the behavior for failing to parse an input element is implementation defined.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'default' is not specified...
.Input(0, "X", "Tensor with datetime strings", "T1", OpSchema::Single, true, 1, OpSchema::NonDifferentiable) | ||
.Output(0, "y", "Unix time stamps", "T2", OpSchema::Single, true, 1, OpSchema::NonDifferentiable) | ||
.Attr("format", "Format description in the syntax of C's `strptime`.", AttributeProto::STRING) | ||
.Attr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this unit optional as well and choose one for the default.
AttributeProto::STRING) | ||
.Attr( | ||
"default", | ||
"Default value to be used if the parsing fails. The tensor must be of rank 0 and either of type `tensor(int64)` or `tensor(double)`. The tensor type is the output type. If 'default' is specified, the output type is `tensor(int64)` and the behavior for failing to parse an input element is implementation defined.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any combination (unit, default) not possible? The documentation says what the default value type is but not the default value.
It looks good to me. cc @gramalingam |
@xadupre and @cbourjau : I added a bunch of comments to the issue #5409 , FYI. A short summary (of the comments there) is:
|
OPTIONAL_VALUE) | ||
|
||
.TypeConstraint("T1", {"tensor(string)"}, "UTF-8 datetime strings") | ||
.TypeConstraint("T2", {"tensor(double)", "tensor(int64)"}, "Output type depends on 'default' attribute.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support double here? In principle, users can encode it as Where( Equal(value, default), NaN, Cast(value, float))
. Specifically, if we are not worried about nano-seconds (which strptime doesn't seem to support anyway), is the dynamic range of int64 not sufficient (sticking to microseconds)?
Thanks for the comments thus far! I have kept the related issue up-to-date. We are currently test-driving an operator of this kind as a custom operator. I will update this PR once we have gathered more experience. |
Any updates on this PR? |
We have been using datetime parsing via custom operators in the onnxruntime for a few months now. We use essentially the implementation found in this example, except that we only allow the directives mentioned in the related issue. We have technically not gone live with a model using datetime parsing, yet, but on a preliminary basis, the outlined feature set appears to serve real-world use cases. |
Proposal implementation of the operator described in #5409