Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pickling functions within a class defined outside of __main__? #175

Open
mlucas-NU opened this issue May 18, 2018 · 6 comments

Comments

@mlucas-NU
Copy link

mlucas-NU commented May 18, 2018

Cloudpickle's ability to serialize instantiated classes and their member functions is a huge advantage over cloudpickle alternatives. Unfortunately, this feature seems limited heavily to the check for instantiated_class.__module__ == '__main__'.

One hacky solution is to set __module__ in the class definition:

class SomeFunClass(object):
    __module__ = '__main__'

    # Rest of the class definition

But this could cause problems with any code that relies on a proper value for __module__. Is there another solution I'm not seeing?

@pierreglaser
Copy link
Member

functions and classes defined outside of __main__ are serializable, only they will be serialized by module-attribute lookup. It means that the module the function/class comes from is expected to be importable in the environment the function/class will be depickled in.

functions and classes defined inside the __main__ are also serializable, but at unpickling time, the class will not be imported, but re-created from scratch.

If I understand correctly your issue, it seems that you would like to extend the serialization behavior of functions/classes defined in the __main__ module to functions/classes defined in regular modules?
If yes, can you explain the rationale behind it?

@Ark-kun
Copy link

Ark-kun commented Jun 14, 2019

I had to hack the same thing for our project.
We needed to pass the pickled user-provided function to the remote container for the execution and the container only has the packages installed. We had to be sure that the actual function code is being sent, otherwise the call fails at runtime.

We had to add modules_to_capture parameter where you can pass a list of modules that you want to capture during dependency traversal (by default it's [func.__module__]). The functions and classes from those modules are always captured in full rather than being linked to.

It would be nice to have the same functionality built-in.

@Ark-kun
Copy link

Ark-kun commented Jun 14, 2019

The tiny hack I did was just removing the modules from sys.modules while calling cloudpickle.dumps: kubeflow/pipelines#1435 https://github.com/kubeflow/pipelines/pull/1435/files

@hfwittmann
Copy link

There are two more hacks, that are also very simple:

  1. First trick : Use a lambda function

  2. Second trick : Before using a (normal) function change the module

my_normal_function.__module__ = '__main__'

@hfwittmann
Copy link

... still it would be nice to have the functionality in cloudpickle, to disable this check ...

@max-sixty
Copy link

I think this was completed by #417?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants