Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect pickles for subclasses of generic classes #440

Closed
huzecong opened this issue Sep 9, 2021 · 2 comments · Fixed by #448
Closed

Incorrect pickles for subclasses of generic classes #440

huzecong opened this issue Sep 9, 2021 · 2 comments · Fixed by #448

Comments

@huzecong
Copy link
Contributor

huzecong commented Sep 9, 2021

Environment:

  • Python 3.8.7
  • cloudpickle==1.6.0

I've run into a very curious error that only manifests when a couple of conditions are satisfied:

  • There's a base class GBase that is a generic class (inherits from Generic[T]).
  • There's a derived class that inherits from the non-parameterized base class (inherits from GBase, not GBase[T] or GBase[int]).
  • The classes are defined in a notebook or IPython repl (so they're in __main__, and their code must be pickled).
  • I pickle the derived class, and unpickle it in another process where the class isn't defined.

What I observe is then:

  • The unpickled class's __module__ is types, instead of __main__.
  • The non-parameterized bases are missing from the MRO of the unpickled class.

Here's a script to reproduce the issue:

from typing import Generic, TypeVar
import multiprocessing as mp

import cloudpickle

T = TypeVar("T")

class Base: pass
class Derived(Base): pass
class Leaf(Derived): pass

class GBase(Generic[T]): pass
class GDerivedAny(GBase): pass
class GLeafAny(GDerivedAny): pass

class GDerivedInt(GBase[int]): pass
class GLeafInt(GDerivedInt): pass

class GDerivedT(GBase[T]): pass
class GLeafT(GDerivedT[T]): pass

klasses = [
    Base, Derived, Leaf, GBase, GDerivedAny, GLeafAny, GDerivedInt, GLeafInt, GDerivedT, GLeafT
]

def get_mro(klass):
    return [f"{base.__module__}.{base.__qualname__}" for base in klass.mro()]

def test_klass(klass_pickle):
    # This has to be run in a separate process, where the classes aren't defined.
    return get_mro(cloudpickle.loads(klass_pickle))

with mp.Pool(len(klasses)) as pool:
    mros = pool.map(test_klass, [cloudpickle.dumps(klass) for klass in klasses])
    for klass, mro in zip(klasses, mros):
        expected_mro = [base.__name__ for base in klass.mro()]
        mro_without_module = [name.split(".")[1] for name in mro]
        output = [klass, set(expected_mro) - set(mro_without_module), mro]
        print("".join(repr(x).ljust(35) for x in output))

The output is (each row is class, set of missing base classes, and the MRO of the unpickled class):

<class '__main__.Base'>            set()                              ['__main__.Base', 'builtins.object']
<class '__main__.Derived'>         set()                              ['types.Derived', '__main__.Base', 'builtins.object']
<class '__main__.Leaf'>            set()                              ['types.Leaf', 'types.Derived', '__main__.Base', 'builtins.object']
<class '__main__.GBase'>           set()                              ['__main__.GBase', 'typing.Generic', 'builtins.object']
<class '__main__.GDerivedAny'>     {'GBase'}                          ['types.GDerivedAny', 'typing.Generic', 'builtins.object']
<class '__main__.GLeafAny'>        {'GBase', 'GDerivedAny'}           ['types.GLeafAny', 'typing.Generic', 'builtins.object']
<class '__main__.GDerivedInt'>     set()                              ['types.GDerivedInt', '__main__.GBase', 'typing.Generic', 'builtins.object']
<class '__main__.GLeafInt'>        {'GDerivedInt'}                    ['types.GLeafInt', '__main__.GBase', 'typing.Generic', 'builtins.object']
<class '__main__.GDerivedT'>       set()                              ['types.GDerivedT', '__main__.GBase', 'typing.Generic', 'builtins.object']
<class '__main__.GLeafT'>          set()                              ['types.GLeafT', 'types.GDerivedT', '__main__.GBase', 'typing.Generic', 'builtins.object']

You can see that all non-parameterized bases are missing from the MRO, and the module for all but the base classes are types.

@pierreglaser
Copy link
Member

pierreglaser commented Sep 12, 2021

Hi @huzecong, thanks for the report.

Indeed, it seems like we've been a bit reckless in our handling of the base classes of "subclasses of generic classes".

The reason for cloudpickle omitting some elements as opposed to the output of mro() most likely comes from _get_bases, which is used when pickling interactively defined classes.

def _get_bases(typ):
if hasattr(typ, '__orig_bases__'):
# For generic types (see PEP 560)
bases_attr = '__orig_bases__'
else:
# For regular class objects
bases_attr = '__bases__'
return getattr(typ, bases_attr)

Not sure exactly of how to arbitrate between the two possible types of base classes in such hybrid cases of subclasses of generic classes, but feel free to submit a patch if you decide to do some additional digging on that topic.

For the record, here is some justification w.r.t the other conditions that need to be met in order for this bug to show up:

I pickle the derived class, and unpickle it in another process where the class isn't defined.

This comes from some automatic class memoization that cloudpickle implements. Admittedly we should expose a way to deactivate this behavior to make debugging of class-related pickling issues easier.

The classes are defined in a notebook or IPython repl (so they're in main, and their code must be pickled).

Pretty much only in this case will cloudpickle's pickling logic be activated. Otherwise, cloudpickle delegates serialization to pickle. This condition confirms that this bug is cloudpickle (and not pickle) related.

W.r.t the wrong module assignment, I'd have to dig further.

@dannyfriar
Copy link

Related issue on the latest cloudpickle with python 3.8: #463

pierreglaser added a commit that referenced this issue Feb 16, 2022
* Fix #440: Incorrect pickles for subclasses of generic classes

* Update CHANGES

* Empty Commit to trigger CI

Co-authored-by: Pierre Glaser <pierreglaser@msn.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants