Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect deserialization of subclasses, module changed to types #468

Open
simon-mo opened this issue May 3, 2022 · 4 comments
Open

Incorrect deserialization of subclasses, module changed to types #468

simon-mo opened this issue May 3, 2022 · 4 comments

Comments

@simon-mo
Copy link

simon-mo commented May 3, 2022

This issue is similar to #440 but I have verified it still happens after the fix (on latest master).

Somehow the deserialized subclass has __module__ of types instead of __main__. This also happen when the classes are moved to their separate files.

See the following repro script:

import cloudpickle
import multiprocessing as mp

print(cloudpickle.__version__)

class Parent:
    pass

class Child(Parent):
    pass

def get_mro(klass):
    return [f"{base.__module__}.{base.__qualname__}" for base in klass.mro()]

def task(b: bytes):
    cls = cloudpickle.loads(b)
    return str(cls), get_mro(cls)

for klass in [Parent, Child]:
    with mp.Pool() as pool:
        cls_name, mros = pool.apply(task, (cloudpickle.dumps(klass),))
    print()
    print("local class name", str(klass))
    print("deserialized class name", cls_name)
    print()
    print("local mro", get_mro(klass))
    print("deserialized mro", mros)

My output on Python 3.7 f758eb3

2.1.0.dev0

local class name <class '__main__.Parent'>
deserialized class name <class '__main__.Parent'>

local mro ['__main__.Parent', 'builtins.object']
deserialized mro ['__main__.Parent', 'builtins.object']

local class name <class '__main__.Child'>
deserialized class name <class 'types.Child'>

local mro ['__main__.Child', '__main__.Parent', 'builtins.object']
deserialized mro ['types.Child', '__main__.Parent', 'builtins.object']
@ogrisel
Copy link
Contributor

ogrisel commented May 23, 2022

Indeed I confirm the __module__ of the subclass is wrongly assigned to types also on Python 3.10.

@ender-wieczorek
Copy link

ender-wieczorek commented Jun 10, 2022

This happens because we don't save attributes which have the same value in the parent class (see the code here). In particular, this means we don't save __module__ when the child class is in the same module. When we reconstruct the class with types.new_class(), __module__ gets set to types.

I currently patch cloudpickle with:

cloudpickle.cloudpickle_fast._extract_class_dict = lambda cls: dict(cls.__dict__)

@ogrisel
Copy link
Contributor

ogrisel commented Jul 11, 2022

Would you mind submitting a PR with a non-regression test?

@pierreglaser
Copy link
Member

Indeed, this issue was not fixed (knowingly) by #448, see #448 (comment). I'll get it done ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants