Skip to content

How to store a pipeline contains Gridsearchcv object? #19504

Answered by NicolasHug
Erica-Ko asked this question in Q&A
Discussion options

You must be logged in to vote

Can you please provide the entire code? Can you also try with a scikit-learn estimator instead of the LightGBM one?

Also, there's a data leak:

Pipeline([('scaler', StandardScaler()), ('model', GridSearchCV(...))

Should be

GridSearchCV(Pipeline([('scaler', StandardScaler()), ('model', lgbm_estimator)]))

In the first snippet, the CV done by grid-search takes as input the entire normalized data, and so the folds are not independent anymore. It seems that you're only doing model selection and not model evaluation so it might not be super important, but it's worth noting that this is in general incorrect.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@Erica-Ko
Comment options

@NicolasHug
Comment options

@Erica-Ko
Comment options

Answer selected by lesteve
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants