Update ppo_pettingzoo_ma_atari.py #408

elliottower · 2023-07-12T18:13:57Z

This PR updates the pettingzoo multiagent atari example to use gymnasium rather than gym and to use the current pettingzoo API (with termination and truncation, following gymnasium/gym v26). I've had some people ask about more in-depth CleanRL resources for PettingZoo, so I figure updating this would be a good start.

Unfortunately it seems like the record episode statistics won't work because it expects a dict for info, and the supersuit's concat_vec_env makes the info into a list of dicts. Could write a custom wrapper to do that but it's not entirely clear to me that's the best way to do things. We have been looking into mirroring the gymnasium vector API into PettingZoo which would allow the recorder class and most other gymnasium functionality to work, as far as I can tell, but that will take some time.

Description

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture-video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Updates to use Gymnasium and current PettingZoo API

vercel · 2023-07-12T18:14:01Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 18, 2024 7:27pm

elliottower · 2023-07-13T16:03:12Z

Not sure how big of a problem it is not being able to record episode statistics, it may be better to keep things as they are currently so it doesn't lose functionality, and I can put this updated version onto the pettingzoo docs.

There may be a way to get the episode statistics to work but it seems difficult as I explained above

elliottower · 2023-07-13T16:07:26Z

@vwxyzjn I'm imagining this would also require running the benchmarks? Let me know if you have any thoughts on this, I've also been considering doing an action masking example for other PZ envs, would that be something you're interested in having here?

vwxyzjn

The changes looks good to me. I think we should do a benchmark run though because pettingzoo’s version changed a lot. Having an action mask example would be good too, but let’s do that in a separate PR

cleanrl/ppo_pettingzoo_ma_atari.py

KaleabTessera

I think there is possibly a bug here. Truncation should be treated differently to termination when bootstrapping - as done in stable baselines. This made a pretty big diff in my own experiments.

I see similar issues exist - #198 and https://github.com/vwxyzjn/cleanrl/pull/311/files , but I don't think terminal_observation was used there.

KaleabTessera · 2023-11-02T17:08:29Z

cleanrl/ppo_pettingzoo_ma_atari.py

@@ -219,6 +227,8 @@ def get_action_and_value(self, x, action=None):
            next_value = agent.get_value(next_obs).reshape(1, -1)
            advantages = torch.zeros_like(rewards).to(device)
            lastgaelam = 0
+            next_done = np.logical_or(next_termination, next_truncation)


I think there is bug here. We should still bootstrap if next_truncation=True :

"you should bootstrap if infos[env_idx]["TimeLimit.truncated"] is True (episode over due to a timeout/truncation) or dones[env_idx] is False (episode not finished)." - stable baselines

So next_done=next_termination and dones=terminations (probs just use next_terminations and terminations directly e.g. nextnonterminal = 1.0 - next_termination ).

To implement this correctly we also need access to terminal_observation from pettingzoo_env_to_vec_env_v1 since we need access to the true terminal obs and not the obs returned by the next restart (the case currently -- so we need infos to provide access to the terminal obs). I have a PR out for this . Then we can implement something like this to do correct bootstrapping for truncating/timeout.

Good catch @KaleabTessera would you be willing to update this branch with the changes? I can give you edit access, I currently have a lot of other obligations from work so don’t have much time for this

Oh shoot it’s a patch-1 so I don’t know if you can be given access. But if you clone the repo you can make a new branch from this branch and make a new PR if it’s not possible to edit this branch? Or maybe make a PR to update this branch itself. Sorry I can’t help more

FYI I am doing a refactor at #424 . Gonna try run a whole suite of benchmark soon.

Oh okay cool, sorry I remember you gave access to the WandB thing but I've not had time to do it. Probably simplest if you do it anyways, so thanks for that. It may be interesting to compare performance with the AgileRL multi agent atari example https://docs.agilerl.com/en/latest/tutorials/pettingzoo/maddpg.html

I see the issue linked in that PR mentions timeout handling, is that the same as mentioned below with termination vs truncation? Anyways there's anything needed from PettingZoo or SuperSuit's end let me know.

I think there is bug here. We should still bootstrap if next_truncation=True :

"you should bootstrap if infos[env_idx]["TimeLimit.truncated"] is True (episode over due to a timeout/truncation) or dones[env_idx] is False (episode not finished)." - stable baselines

So next_done=next_termination and dones=terminations (probs just use next_terminations and terminations directly e.g. nextnonterminal = 1.0 - next_termination ).

To implement this correctly we also need access to terminal_observation from pettingzoo_env_to_vec_env_v1 since we need access to the true terminal obs and not the obs returned by the next restart (the case currently -- so we need infos to provide access to the terminal obs). I have a PR out for this . Then we can implement something like this to do correct bootstrapping for truncating/timeout.

Btw, just as an update, the SuperSuit PR linked above has been merged. My only concern with this is that whatever bootstrapping behavior is done here should mirror what is done with the single agent PPO implementations, so this is a question for @vwxyzjn.

My inclination is to keep the logic as it currently is in this PR and address that bootstrapping issue in another PR (maybe @KaleabTessera is interested in doing that? I don't have a whole lot of time to look into it nor am I the best person to do it as I'm not an expert)

ezhang7423 · 2024-01-18T15:13:02Z

Are there any updates on this?

elliottower · 2024-01-18T15:37:20Z

Looks like #424 was merged,

Are there any updates on this?

Looks like #424 was merged, but it didn't update the PettingZoo example besides a minor CLI arguments change. Re-reading Costa's messages I think I misinterpreted him that he was intending to update this himself (or maybe he didn't have time).

Anyways, I have some time today and will try to resolve these conflicts and integrate @KaleabTessera's suggestions, so we can at least have an updated version of this script. Won't have time for benchmarking in the near future but could eventually get to it.

vwxyzjn · 2024-01-18T15:59:57Z

Yeah sorry @elliottower things have gotten busy. Feel free to submit a PR. As long as you can reproduce the existing benchmark experiments we can merge :)

elliottower · 2024-01-18T16:02:13Z

No worries, sounds good. I'll just use this same PR for simplicity's sake.

elliottower · 2024-01-18T16:05:35Z

Btw just FYI there's a bunch of already merged branches in this repo which could probably deleted (am pulling the most recent master branch and see a huge list)

# Conflicts: # poetry.lock # pyproject.toml # requirements/requirements-dm_control.txt # requirements/requirements-optuna.txt # requirements/requirements-pettingzoo.txt

… reqs (subdependency of supersuit)

…s to change

…eep going forward)

Update ppo_pettingzoo_ma_atari.py

d9b9b11

Updates to use Gymnasium and current PettingZoo API

vercel bot deployed to Preview July 12, 2023 18:14 View deployment

Pre-commit

edc79d6

vercel bot deployed to Preview July 13, 2023 03:06 View deployment

Update PZ version

d39da5e

vercel bot deployed to Preview July 13, 2023 14:26 View deployment

Update Super

2b2dfce

vercel bot deployed to Preview July 13, 2023 14:30 View deployment

Run pre-commit --hook-stage manual --all-files

6d37313

vercel bot deployed to Preview July 13, 2023 14:58 View deployment

run poetry lock --no-update to fix inconsistencies with versions

0168986

vercel bot deployed to Preview July 13, 2023 15:45 View deployment

re-run pre-commit with --hook-stage manual

b7bffe9

vercel bot deployed to Preview July 13, 2023 15:46 View deployment

vwxyzjn reviewed Jul 13, 2023

View reviewed changes

cleanrl/ppo_pettingzoo_ma_atari.py Outdated Show resolved Hide resolved

elliottower mentioned this pull request Jul 16, 2023

[Bug Report] render() returns only 0. under MPE env with render_mode='rgb_array' Farama-Foundation/PettingZoo#963

Closed

1 task

Change torch.maximum to torch.logical_or for dones

2c76bb1

vercel bot deployed to Preview July 17, 2023 18:52 View deployment

Use np.logical_or instead of torch (allows subtraction)

025f491

vercel bot deployed to Preview July 18, 2023 20:29 View deployment

elliottower mentioned this pull request Jul 18, 2023

Add CleanRL mutli-agent Atari example Farama-Foundation/PettingZoo#1033

Merged

7 tasks

vwxyzjn mentioned this pull request Oct 16, 2023

Refactor to use tyro #424

Merged

18 tasks

KaleabTessera suggested changes Nov 2, 2023

View reviewed changes

KaleabTessera mentioned this pull request Nov 2, 2023

[Bug Report] Possible bug with bootstrapping when environment is truncated in CleanRL mutli-agent Atari example Farama-Foundation/PettingZoo#1126

Open

1 task

elliottower added 2 commits January 18, 2024 11:09

Merge remote-tracking branch 'upstream/master' into patch-1

09f7a7f

# Conflicts: # poetry.lock # pyproject.toml # requirements/requirements-dm_control.txt # requirements/requirements-optuna.txt # requirements/requirements-pettingzoo.txt

Finish merge with upstream master

16e0764

vercel bot deployed to Preview January 18, 2024 16:13 View deployment

Fix SuperSuit to most recent version

928b7b3

vercel bot deployed to Preview January 18, 2024 16:15 View deployment

Fix SuperSuit version in poetry lockfile and tinyscaler in pettingzoo…

d7a2aa2

… reqs (subdependency of supersuit)

vercel bot deployed to Preview January 18, 2024 16:25 View deployment

Fix pettingzoo-requirements export (pre-commit hooks)

d77cca0

vercel bot deployed to Preview January 18, 2024 16:50 View deployment

Test updating pettingzoo to new version 1.24.3

afba4e8

vercel bot deployed to Preview January 18, 2024 17:41 View deployment

elliottower added 2 commits January 18, 2024 13:33

Update ma_atari to match regular atari (tyro, minor code style changes)

8671154

pre-commit

d2cf1a5

vercel bot deployed to Preview January 18, 2024 18:34 View deployment

Revert accidentally changed files (zoo and ipynb, which randomly seem…

981bc63

…s to change

vercel bot deployed to Preview January 18, 2024 18:35 View deployment

Revert ipynb change

454364d

vercel bot deployed to Preview January 18, 2024 18:40 View deployment

Update dead pettingzoo.ml links to Farama foundation links

06473b2

vercel bot deployed to Preview January 18, 2024 18:53 View deployment

Update to newly release SuperSuit 3.9.2 (minor bugfixes but best to k…

1b725cf

…eep going forward)

vercel bot deployed to Preview January 18, 2024 19:27 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ppo_pettingzoo_ma_atari.py #408

Update ppo_pettingzoo_ma_atari.py #408

elliottower commented Jul 12, 2023 •

edited

vercel bot commented Jul 12, 2023 •

edited

elliottower commented Jul 13, 2023

elliottower commented Jul 13, 2023

vwxyzjn left a comment

KaleabTessera left a comment •

edited

KaleabTessera Nov 2, 2023 •

edited

elliottower Nov 3, 2023

elliottower Nov 3, 2023

vwxyzjn Nov 3, 2023

elliottower Nov 3, 2023

elliottower Jan 18, 2024 •

edited

ezhang7423 commented Jan 18, 2024

elliottower commented Jan 18, 2024

vwxyzjn commented Jan 18, 2024

elliottower commented Jan 18, 2024

elliottower commented Jan 18, 2024

Update ppo_pettingzoo_ma_atari.py #408

Are you sure you want to change the base?

Update ppo_pettingzoo_ma_atari.py #408

Conversation

elliottower commented Jul 12, 2023 • edited

Description

Types of changes

Checklist:

vercel bot commented Jul 12, 2023 • edited

elliottower commented Jul 13, 2023

elliottower commented Jul 13, 2023

vwxyzjn left a comment

Choose a reason for hiding this comment

KaleabTessera left a comment • edited

Choose a reason for hiding this comment

KaleabTessera Nov 2, 2023 • edited

Choose a reason for hiding this comment

elliottower Nov 3, 2023

Choose a reason for hiding this comment

elliottower Nov 3, 2023

Choose a reason for hiding this comment

vwxyzjn Nov 3, 2023

Choose a reason for hiding this comment

elliottower Nov 3, 2023

Choose a reason for hiding this comment

elliottower Jan 18, 2024 • edited

Choose a reason for hiding this comment

ezhang7423 commented Jan 18, 2024

elliottower commented Jan 18, 2024

vwxyzjn commented Jan 18, 2024

elliottower commented Jan 18, 2024

elliottower commented Jan 18, 2024

elliottower commented Jul 12, 2023 •

edited

vercel bot commented Jul 12, 2023 •

edited

KaleabTessera left a comment •

edited

KaleabTessera Nov 2, 2023 •

edited

elliottower Jan 18, 2024 •

edited