Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ISO 8601 datetime format to a numeric #999

Open
robingenz opened this issue Jun 13, 2023 · 2 comments
Open

Convert ISO 8601 datetime format to a numeric #999

robingenz opened this issue Jun 13, 2023 · 2 comments

Comments

@robingenz
Copy link

I am currently working on a model that takes as input, among other data, a string in ISO 8601 datetime format.
This string should be converted into a (numeric) timestamp using a converter.

Example:

  • Input: 2023-06-13T04:53:00.280Z
  • Output: 1686631980

The sklearn pipeline looks like this:

timestamp_column_indices = [
    'CreatedDate'
]

class TimestampTransformer(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        X = X.copy()
        X[created_date_column_name] = pd.to_datetime(X[created_date_column_name])
        X[created_date_column_name] = X[created_date_column_name].astype(np.int64) // 10**9
        return X
    

column_transformer = ColumnTransformer(transformers=[
    ('timestamp', TimestampTransformer(), timestamp_column_indices)
], remainder='passthrough')
classifier = RandomForestClassifier()
clr_pipeline = Pipeline([
    ('column_transformer', column_transformer),
    ('classifier', classifier),
])

(Unnecessary columns have been removed for clarity).

With the help of the TimestampTransformer the string in ISO 8601 datetime format is converted into a timestamp. Unfortunately I get the following error message when exporting the model to ONNX format:

Unable to find a shape calculator for type '<class '__main__.TimestampTransformer'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

I understand the problem and have also read through the documentation on how to implement a new converter.
Unfortunately I have no idea what is the best way to start.
I am very new to the ONNX format and hope someone can give me a hint on how to solve this problem.

@xadupre
Copy link
Collaborator

xadupre commented Jun 22, 2023

Unfortunately, there is no operator thaking a string and returning a numerical information like you need and no way to do that with the existing op. So you would need to introduce a new operator to onnx. It can be in onnx repository but it needs to be approved by the community. You may need to attend one the SIG meeting: https://github.com/microsoft/onnxruntime-extensions/blob/main/docs/custom_ops.md. It can be a custom operator implemented in python (see onnxruntime-extensions) or in C++ depending on where you need to deploy.

Once it is done, a new converter needs to be registered in sklearn-onnx to convert your custom transformer.

@xadupre
Copy link
Collaborator

xadupre commented Jul 27, 2023

You should follow this PR onnx/onnx#5417. Once it is merged, it will be part of onnx standard and onnxruntime will implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants