New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Cloudpickle losing parents of types defined in __main__ if an ancestor used a type annotation #22553
Conversation
@ericl @simon-mo should we fix the upstream (https://github.com/cloudpipe/cloudpickle) instead and pull the change? |
cc @suquark |
This fix seems valid to me. But we should probably also post to the upstream to see the opinion of the cloudpickle maintainers. |
@andenrx I think the recent fix from upstream should address your issue: cloudpipe/cloudpickle#448. Could you confirm that? Thanks! |
@suquark Yeah, that should fix it. Although is there a reason why there's two different codebases for cloudpickle as opposed to just making it a dependency for ray? |
@andenrx This is because it usually takes cloudpickle several months for a release, but Ray almost releases monthly. Also the changes in cloudpickle sometimes break Ray. So we decide to ship our own version to include latest patches and exclude problematic changes. Could you cherrypick their patch instead for this issue? That seems a simpler fix shipped with more strict tests. Thank you so much for creating this PR. |
@suquark looks this blocks tune upgrading to latest gym. should we merge this PR or upgrade to latest cloudpickle? |
Waiting for #23661 |
#23661 is not really a blocker. but we need to make sure if we revert the format changes we don't accidentally revert this one. |
…t#448' from cloudpickle (ray-project#22553) Co-authored-by: Chen Shen <scv119@gmail.com>
Why are these changes needed?
When ray cannot find a valid import for a type (ie. the type was defined in
__main__
), it pickles the object by providing a skeleton class function. This function creates a new class on the remote that is intended to be identical to the local version. It chooses the parents of the type through the_get_bases
function. Currently the function takes__orig_bases__
if it is defined and uses__bases__
as a fallback. However,__orig_bases__
is populated by the most recent type annotated ancestor (ieGeneric[T]
), which might not necessarily be the parents. Meanwhile__bases__
is always parents. For this reason, objects can behave differently remotely vs locally.Suppose we have:
In this case we have
B.__bases__ == (A,)
(the parent) andB.__orig_bases__ = (Generic[T],)
(the most recent ancestor with type annotations).However, if we move
B
to a worker, we end up with an inheritance structure ofGeneric[T] => B
rather thanGeneric[T] => A => B
like what we have locally.This is quite problematic as
isinstance(B, A)
will giveTrue
locally andFalse
remotely. Additionally,super
calls will fail and any methods defined for A but not B will be inaccessible for remote B objects.More detail on the issue is included in #22552 along with reproducible code.
My solution is to always use
__bases__
for the parents and then copy over the__orig_bases__
meta data if it exists.Related issue number
#22552 [Bug] Cloudpickle losing parents of types defined in
__main__
if an ancestor used a type annotationChecks
scripts/format.sh
to lint the changes in this PR.