New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: What is the recommended way to serialize Pipelines with custom transformers? #17390
Comments
+1. Currently facing the same issue... |
@avinashpancham I figured it out, you can use cloudpickle. Then you can unpickle with regular pickle when you want to use it. |
Thanks, that indeed works :) |
I am currently facing an issue with serializing a Pipeline with transformers. May I ask you to insert code or add a link with the aforementioned approach using cloudpickle? Thanks! |
@mariyamiteva if you share your code here maybe I can help you out |
@mariyamiteva @avinashpancham @Ben-Epstein did you ever end up figuring this out? I'm also having this issue, even when I use Say I have a pipeline like this one defined in a module called
And then I serialize it as follows:
When I go to deserialize it in another application:
I get a stack trace:
|
@cakemountain are you asking if the entire external python package can be pickled with your object? That's a new feature of cloudpickle I believe. It certainly makes sense that it would not be the default, otherwise all pickled objects would be pretty enormous. It's also quite hard to recursively figure out every package that an object depends on. Nonetheless, it's an extremely useful feature when you need it (which can be often!) I believe what you're looking for is here cloudpipe/cloudpickle#417 |
Thanks @Ben-Epstein, I appreciate the link. I spent a lot of time looking for exactly that and was unable to find it. Agreed, obviously pickling an entire module by default wouldn't make sense, but for the use case I described above it's nice to not have to copy-paste files across repositories (🤮) in order to get a deserialized Pipeline to work. |
@cakemountain I've found cloudpickle (>=2.0.0) addresses your use case. You just need the additional function cloudpickle.register_pickle_by_value().
|
When building a Pipeline with custom transformers, what is the best way to serialize that for later use?
If you use pickle, you need to define those functions in the new environment, so that doesn't seem like a solution to me. I ran into the same issue with dill and joblib.
What is the best practice here?
Thanks!
The text was updated successfully, but these errors were encountered: