Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cloudpickle losing parents of types defined in __main__ if an ancestor used a type annotation #22552

Closed
2 tasks done
andenrx opened this issue Feb 22, 2022 · 2 comments
Closed
2 tasks done
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)
Milestone

Comments

@andenrx
Copy link
Contributor

andenrx commented Feb 22, 2022

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

When Ray cannot import a specific type, it creates a "skeleton class" remotely instead. This skeleton class is meant to be identical to the original, however, if an ancestor of the class used type annotations, the skeleton class will subclass the most recent type annotations rather than the parents. In rare cases this can cause the remote class to differ from the local one.

This is particularly problematic, because MultiAgentEnv from rllib subclasses gym.Env, and with gym version 22, gym.Env now subclasses Generic[ObsType, ActType]. This means that if you define an env subclassing from MultiAgentEnv in __main__, on the workers, your environement will be a subclass of Generic[ObsType, ActType] -- not MultiAgentEnv.

Versions / Dependencies

Tested on Python 3.7 and 3.9 with Ray 1.10.0.

The specific problem with MultiAgentEnv requires gym version 22

Reproduction script

Just using ray.remote:

import ray
from typing import Generic, TypeVar
from gym import Env
T = TypeVar("T")

class A(Generic[T]):
    def __repr__(self):
        return "My Type"
class B(A):
    pass
class C(A[T]):
    pass

def is_type(obj, type):
  return isinstance(obj, type)

def call_repr(obj):
    return repr(obj)

# On local, B inherits from A
print(f"Local:")
print(f"  is_type(B(), A) is {is_type(B(), A)}") # True
print(f"  is_type(C(), A) is {is_type(C(), A)}") # True
print(repr(B()), repr(C())) # My Type My Type
print()
# On remote, B does not inherit from A, but rather from Generic[T], the most recent ancestor with type annotations
print(f"Remote:")
print(f"  is_type(B(), A) is {ray.get(ray.remote(is_type).remote(B(), A))}") # True
print(f"  is_type(C(), A) is {ray.get(ray.remote(is_type).remote(C(), A))}") # False
print(ray.get(ray.remote(call_repr).remote(B())), ray.get(ray.remote(call_repr).remote(C()))) # <types.B object at 0x7c7728491100> My Type

Rllib MultiAgent:

# This will fail with gym version 22
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from ray.rllib.agents import ppo
from ray.tune.registry import register_env
from gym.spaces.box import Box

obs = Box(0, 1, shape=(1,))
act = Box(0, 1, shape=(1,))
class DemoEnv(MultiAgentEnv):
    """
    The type hierarchy looks like this:
    Generic[ObsType, ActType] => gym.Env => MultiAgentEnv => DemoEnv

    However, on our remote worker it will be:
    Generic[ObsType, ActType] => DemoEnv

    This causes the isinstance(env, MultiAgentEnv) check to fail
    """
    def __init__(self):
        self._agent_ids = ["agent"]
        self.observation_space = obs
        self.action_space = act
    def reset(self):
        return {"agent": obs.sample()}
    def step(self, action):
        return (
            {"agent": obs.sample()},
            {"agent": 0.0},
            {"agent": False},
            {}
        )

register_env("Demo", lambda _: DemoEnv())

config = {
    "env": "Demo",
    "num_workers": 2,
}
trainer = ppo.PPOTrainer(config)

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@andenrx andenrx added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 22, 2022
@rkooo567 rkooo567 added this to the Core Backlog milestone Feb 24, 2022
@rkooo567
Copy link
Contributor

Thanks for submitting the PR!

@krfricke
Copy link
Contributor

krfricke commented Apr 4, 2022

Closing as #22553 is merged

@krfricke krfricke closed this as completed Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

3 participants