Replies: 1 comment
-
We do not really have great developer support for defining a custom scikit-learn/sklearn/utils/_set_output.py Lines 231 to 238 in 681e8e2 Concretely, in your case: def transform(self, X, y=None):
...
# Get the configuration for estimator
config = getattr(self, "_sklearn_output_config", {})
set_output_api_val = config.get("transform", None)
df[df.isin(missing_value_placeholders)] = np.nan
features = simple_imputer_obj.set_output(transform=set_output_api_val).transform(df)
...
def set_output(self, transform=None):
if transform is None:
return self
if not hasattr(self, "_sklearn_output_config"):
self._sklearn_output_config = {}
self._sklearn_output_config["transform"] = transform
return self Another way to do it is to define def transform(...):
# same as above
def get_feature_names_out(self, input_features=None):
return self.simple_imputer_obj.get_feature_names_out(input_features) Edit: If |
Beta Was this translation helpful? Give feedback.
-
I have an example Pipeline that works just fine with the "set_output" API when I fit it on its own. However, when I pass this Pipeline object to "GridSearchCV", the output is no longer being preserved as a Pandas dataframe, causing bugs within the pipeline.
Here is some example code (note the custom Transformer that is causing the bug):
This is the print output (suggesting that my configuration from ".set_output()" was not being stored at ".transform()" time):
Here is a truncated traceback (confirming that the final output in ".transform()" was a Numpy array rather than a Pandas dataframe):
Interestingly, I don't get the error when I set the output to Pandas using the global "set_config()" function.
My question is why does my code work just fine with the standalone Pipeline, but not when I pass this Pipeline to a GridSearchCV object?
Beta Was this translation helpful? Give feedback.
All reactions