Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cloudpickle losing parents of types defined in __main__ if an ancestor used a type annotation #22553

Merged
merged 2 commits into from Apr 4, 2022

Conversation

andenrx
Copy link
Contributor

@andenrx andenrx commented Feb 22, 2022

Why are these changes needed?

When ray cannot find a valid import for a type (ie. the type was defined in __main__), it pickles the object by providing a skeleton class function. This function creates a new class on the remote that is intended to be identical to the local version. It chooses the parents of the type through the _get_bases function. Currently the function takes __orig_bases__ if it is defined and uses __bases__ as a fallback. However, __orig_bases__ is populated by the most recent type annotated ancestor (ie Generic[T]), which might not necessarily be the parents. Meanwhile __bases__ is always parents. For this reason, objects can behave differently remotely vs locally.

Suppose we have:

class A(Generic[T]): pass
class B(A): pass

In this case we have B.__bases__ == (A,) (the parent) and B.__orig_bases__ = (Generic[T],) (the most recent ancestor with type annotations).
However, if we move B to a worker, we end up with an inheritance structure of Generic[T] => B rather than Generic[T] => A => B like what we have locally.
This is quite problematic as isinstance(B, A) will give True locally and False remotely. Additionally, super calls will fail and any methods defined for A but not B will be inaccessible for remote B objects.

More detail on the issue is included in #22552 along with reproducible code.

My solution is to always use __bases__ for the parents and then copy over the __orig_bases__ meta data if it exists.

Related issue number

#22552 [Bug] Cloudpickle losing parents of types defined in __main__ if an ancestor used a type annotation

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@jjyao
Copy link
Contributor

jjyao commented Feb 22, 2022

@ericl @simon-mo should we fix the upstream (https://github.com/cloudpipe/cloudpickle) instead and pull the change?

@simon-mo
Copy link
Contributor

cc @suquark

@suquark
Copy link
Member

suquark commented Feb 22, 2022

This fix seems valid to me. But we should probably also post to the upstream to see the opinion of the cloudpickle maintainers.

@suquark
Copy link
Member

suquark commented Feb 23, 2022

@andenrx I think the recent fix from upstream should address your issue: cloudpipe/cloudpickle#448. Could you confirm that? Thanks!

@andenrx
Copy link
Contributor Author

andenrx commented Feb 23, 2022

@suquark Yeah, that should fix it.

Although is there a reason why there's two different codebases for cloudpickle as opposed to just making it a dependency for ray?

@suquark
Copy link
Member

suquark commented Feb 24, 2022

@andenrx This is because it usually takes cloudpickle several months for a release, but Ray almost releases monthly. Also the changes in cloudpickle sometimes break Ray. So we decide to ship our own version to include latest patches and exclude problematic changes.

Could you cherrypick their patch instead for this issue? That seems a simpler fix shipped with more strict tests. Thank you so much for creating this PR.

@suquark suquark added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 24, 2022
@scv119
Copy link
Contributor

scv119 commented Apr 1, 2022

cc @gjoliver @avnishn

@scv119
Copy link
Contributor

scv119 commented Apr 1, 2022

@suquark looks this blocks tune upgrading to latest gym. should we merge this PR or upgrade to latest cloudpickle?

@rkooo567
Copy link
Contributor

rkooo567 commented Apr 4, 2022

Waiting for #23661

@scv119 scv119 merged commit 3e7c823 into ray-project:master Apr 4, 2022
@scv119
Copy link
Contributor

scv119 commented Apr 4, 2022

#23661 is not really a blocker. but we need to make sure if we revert the format changes we don't accidentally revert this one.

edoakes pushed a commit to edoakes/ray that referenced this pull request Apr 7, 2022
…t#448' from cloudpickle (ray-project#22553)

Co-authored-by: Chen Shen <scv119@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants