-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] InceptionTime does not support validation data with fewer classes than in the training data #5864
Comments
I just had a brief glance at the former sktime-dl classifiers migrated to the sktime repository. It seems like the issue is fixed, but there is a potential footgun. I would be interested in a maintainers opinion on how to proceed. Or is my interpretation incorrect? Potential problemOne one hand, it seems like sktime/sktime/classification/deep_learning/base.py Lines 106 to 123 in d86114f
In turn, convert_y_to_keras is referenced by all classifiers in the _fit but not in the _predict functions.sktime/sktime/classification/deep_learning/inceptiontime.py Lines 138 to 152 in d86114f
Thus, predicting will not implicitly (and wrongly) re-fit the label-encoder. On the other hand, X_train, y_train = load_training_data(...) # y is encoded as categorical class labels.
X_test, y_test = load_training_data(...) # y is encoded as categorical class labels.
clf = InceptionTimeClassifier() # Or many other deep classifiers
clf.fit(X_train, y_train)
score = clf.score(X_test, clf.convert_y_to_keras(y_test)) # Because clf.score(X_test, y_test) will fail. Possible solutions
[1] https://keras.io/api/layers/core_layers/dense/, see "Output shape" section:
|
PS: I initially observed this problem when training InceptionTime https://github.com/cedricdonie/tsc-for-wrist-motion-pd-detection/blob/main/src/models/train_model.py and fixed it for myself via sktime/sktime-dl#132. The dataset is quite large however, so it won't be trivial to reproduce. |
Welcome back, @cedricdonie! Sorry for the long silence, we migrated We've pinged you here If it is still relevant, it would be greta if you could move it to
The tests are currently skipped in |
Hi @fkiraly, after migration to sktime proper, sktime/sktime-dl#132 is definitely not a valid fix anymore. Thus, I tried to understand whether this issue still remains in |
Hm, interesting. I think your observation - that the output of I would prefer Besides that, I would combine with your suggestion to not refit the encoder in FYI,
|
@fkiraly I agree with your proposal. Let's simply make it private to prevent potentially causing bugs for users.
I think that this is already the case, as per my above comment: In fact, it could be beneficial to implement a test case that ensures that |
Excellent! Would you then like to make a PR to Recarding your comments:
|
Yes, I created PR #6373.
Great! I have no reason to believe that this is not working. And the present issue is a bug only if the user calls However, we might have an issue with |
The method `convert_y_to_keras` refits the classifiers' onehot and label encoder based on the input data. It is automatically called during `deep_classifier.fit`. If an unwitting user calls it again with data that has different classes, the behavior of the predict function will change. Making the method private prevents accidental incorrect usage and simplifies the interface. ## Other considerations * A deprecation might be sensible rather than making the method private right away. * We could consider providing an `encode_y` function that uses the label encoder if users will want to convert the labels the same way. Fixes #5864. #### What does this implement/fix? Explain your changes. This fixes a "bug" in the sense that a public method of the deep classifiers has a misleading name and should not be callled by unknowing users. The method sounds like it will be side-effect free, but actually changes the state of the estimator substantially. #### Any other comments? * I don't fully understand how users of the classifiers will use the `predict` methods, as reversing label/one-hot encoding would be required to get y_pred in a format that matches the training data input. Perhaps we should offer a side-effect-free `encode_y` method or simply reverse encoding labels and y in the `predict` method. Furthermore, I am not sure that the following works as expected because: train with `X_train` and `y_train_multilabel`, predict on `X_test`, score on `X_test` and `y_test_multilabel`. I suspect that `classifier.predict(X_test)` will return one-hot encoded results, but `classifier.score(X_test, y_test_multilabel` will get multilabel format. * We may want to introduce a deprecation period first, instead of directly making the method private to avoid breaking existing code. To that end, we could have `convert_y_to_keras` issue a DeprecationWarning and then call `_convert_y_to_keras`.
The method `convert_y_to_keras` refits the classifiers' onehot and label encoder based on the input data. It is automatically called during `deep_classifier.fit`. If an unwitting user calls it again with data that has different classes, the behavior of the predict function will change. Making the method private prevents accidental incorrect usage and simplifies the interface. ## Other considerations * A deprecation might be sensible rather than making the method private right away. * We could consider providing an `encode_y` function that uses the label encoder if users will want to convert the labels the same way. Fixes sktime#5864. #### What does this implement/fix? Explain your changes. This fixes a "bug" in the sense that a public method of the deep classifiers has a misleading name and should not be callled by unknowing users. The method sounds like it will be side-effect free, but actually changes the state of the estimator substantially. #### Any other comments? * I don't fully understand how users of the classifiers will use the `predict` methods, as reversing label/one-hot encoding would be required to get y_pred in a format that matches the training data input. Perhaps we should offer a side-effect-free `encode_y` method or simply reverse encoding labels and y in the `predict` method. Furthermore, I am not sure that the following works as expected because: train with `X_train` and `y_train_multilabel`, predict on `X_test`, score on `X_test` and `y_test_multilabel`. I suspect that `classifier.predict(X_test)` will return one-hot encoded results, but `classifier.score(X_test, y_test_multilabel` will get multilabel format. * We may want to introduce a deprecation period first, instead of directly making the method private to avoid breaking existing code. To that end, we could have `convert_y_to_keras` issue a DeprecationWarning and then call `_convert_y_to_keras`.
Describe the bug
When I input a validation dataset that contains fewer categories than the training dataset, I get the following error.
To Reproduce
I'm having trouble producing a MWE. The following code works unexpectedly.
Expected behavior
The label encoder will pad with zeros, allowing training with validation data with fewer classes.
Additional context
The onehot encoder is refitted to validation data. Maybe that is the issue?
Versions
The text was updated successfully, but these errors were encountered: