Implement Gymnasium-compliant PPO script #320

dtch1997 · 2022-11-15T16:44:38Z

Description

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-11-15T16:47:15Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Dec 12, 2022 at 8:55PM (UTC)

vwxyzjn · 2022-11-15T19:23:29Z

CI passed. @dtch1997 would you mind running the first round of benchmark? Don't worry about capturing videos yet because of upstream issues.

export WANDB_ENTITY=openrlbenchmark
poetry install --with mujoco
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
    --command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track --capture-video" \
    --num-seeds 3 \
    --workers 1

dtch1997 · 2022-11-20T22:16:38Z

Benchmark in progress: https://wandb.ai/openrlbenchmark/cleanrl?workspace=user-dtch1997

vwxyzjn · 2022-11-20T22:23:15Z

Great thank you!

vwxyzjn · 2022-11-21T03:40:21Z

Executing the following command in https://github.com/vwxyzjn/ppo-atari-metrics

python rlops.py --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --filters 'ppo_continuous_action?tag=rlops-pilot' 'ppo_continuous_action?tag=pr-320'   \
    --env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
    --output-filename compare.png --scan-history

generates

	ppo_continuous_action ({'tag': ['rlops-pilot']})	ppo_continuous_action ({'tag': ['pr-320']})
HalfCheetah-v4	1795.55 ± 819.96	2241.90 ± 1150.61
Walker2d-v4	2983.19 ± 757.43	3577.82 ± 315.46
Hopper-v4	2279.97 ± 450.53	2111.14 ± 335.94
InvertedPendulum-v4	890.99 ± 48.93	950.98 ± 36.39
Humanoid-v4	671.07 ± 83.75	728.82 ± 62.35
Pusher-v4	-51.27 ± 9.02	-49.51 ± 3.96

vwxyzjn · 2022-11-21T15:58:22Z

Thank you @dtch1997, would you be interested in helping run some dm_control experiments? Please pull the latest code and run

export WANDB_ENTITY=openrlbenchmark
poetry install --with dm_control,mujoco
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
    --env-ids dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0 dm_control/ball_in_cup-catch-v0 dm_control/cartpole-balance-v0 dm_control/cartpole-balance_sparse-v0 dm_control/cartpole-swingup-v0 dm_control/cartpole-swingup_sparse-v0 dm_control/cartpole-two_poles-v0 dm_control/cartpole-three_poles-v0 dm_control/cheetah-run-v0 dm_control/dog-stand-v0 dm_control/dog-walk-v0 dm_control/dog-trot-v0 dm_control/dog-run-v0 dm_control/dog-fetch-v0 dm_control/finger-spin-v0 dm_control/finger-turn_easy-v0 dm_control/finger-turn_hard-v0 dm_control/fish-upright-v0 dm_control/fish-swim-v0 dm_control/hopper-stand-v0 dm_control/hopper-hop-v0 dm_control/humanoid-stand-v0 dm_control/humanoid-walk-v0 dm_control/humanoid-run-v0 dm_control/humanoid-run_pure_state-v0 dm_control/humanoid_CMU-stand-v0 dm_control/humanoid_CMU-run-v0 dm_control/lqr-lqr_2_1-v0 dm_control/lqr-lqr_6_2-v0 dm_control/manipulator-bring_ball-v0 dm_control/manipulator-bring_peg-v0 dm_control/manipulator-insert_ball-v0 dm_control/manipulator-insert_peg-v0 dm_control/pendulum-swingup-v0 dm_control/point_mass-easy-v0 dm_control/point_mass-hard-v0 dm_control/quadruped-walk-v0 dm_control/quadruped-run-v0 dm_control/quadruped-escape-v0 dm_control/quadruped-fetch-v0 dm_control/reacher-easy-v0 dm_control/reacher-hard-v0 dm_control/stacker-stack_2-v0 dm_control/stacker-stack_4-v0 dm_control/swimmer-swimmer6-v0 dm_control/swimmer-swimmer15-v0 dm_control/walker-stand-v0 dm_control/walker-walk-v0 dm_control/walker-run-v0 \
    --command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
    --num-seeds 3 \
    --workers 9

nidhishs · 2022-11-24T21:39:02Z

Hey @dtch1997, I tried running the ppo_continous_actions.py file with --num_envs=4 however done = terminated or truncated no longer works due to terminated and truncated being Numpy arrays. I believe numpy.logical_or should fix it.

dtch1997 · 2022-11-28T00:14:21Z

@nidhishs The num_envs issue should be fixed now.
@vwxyzjn to get the code snippet to run, I had to slightly modify the pyproject.toml to enable automatic installation of the right torch version for the installed CUDA driver. Taken from here: python-poetry/poetry#4231 (comment)

vwxyzjn · 2022-12-12T04:32:18Z

CI passed, but I had to mark the ubuntu install with continue-on-error: true # MUJOCO_GL=osmesa results in free(): invalid pointer`` because of google-deepmind/mujoco#644

cleanrl/ppo_continuous_action.py

dosssman

Looking good overall.
Is there any alternative for video logging of the agent with Gymnasium ?

vwxyzjn · 2022-12-13T02:27:45Z

@dosssman not right now with wandb. Pending wandb/wandb#4510.

vwxyzjn

LGTM. Thanks so much @dtch1997!

dtch1997 added 3 commits November 15, 2022 02:07

Add Gymnasium and dependencies

be6da79

Implement Gymnasium-compliant PPO script

d098870

Ensure pre-commit passes

aa9ac0c

dtch1997 mentioned this pull request Nov 15, 2022

Implement Gymnasium-compliant PPO #319

Closed

20 tasks

Fix CI, add a gymnasium_support folder

cb08637

vercel bot deployed to Preview November 15, 2022 16:47 View deployment

update lock files

5818d7d

vercel bot deployed to Preview November 15, 2022 18:32 View deployment

add dependencies

51a7128

vercel bot deployed to Preview November 15, 2022 18:35 View deployment

update requirements.txt; fix pre-commit

363b48e

vercel bot deployed to Preview November 15, 2022 18:39 View deployment

update poetry files

e3174ca

vercel bot deployed to Preview November 15, 2022 18:52 View deployment

Support dm control action spaces

8da5f7b

vercel bot deployed to Preview November 21, 2022 15:26 View deployment

add dm_control support

0544fcb

vercel bot deployed to Preview November 21, 2022 15:58 View deployment

Enable num_envs>1

d30f3cf

vercel bot deployed to Preview November 28, 2022 00:06 View deployment

Enable auto-install of torch based on CUDA version

99f7789

vercel bot deployed to Preview November 28, 2022 00:12 View deployment

vwxyzjn requested review from kinalmehta and qgallouedec December 11, 2022 23:20

fix dm_control test cases

8fd5657

vercel bot deployed to Preview December 11, 2022 23:24 View deployment

quick docs

a140a3e

vercel bot deployed to Preview December 11, 2022 23:26 View deployment

fix tests on CI

708cfad

vercel bot deployed to Preview December 11, 2022 23:42 View deployment

fix test case

6a41003

vercel bot deployed to Preview December 11, 2022 23:45 View deployment

fix CI

3efbc24

vercel bot deployed to Preview December 12, 2022 03:35 View deployment

Fix CI

0e8df6f

vercel bot deployed to Preview December 12, 2022 03:41 View deployment

update mujoco dependency

19d08d5

vercel bot deployed to Preview December 12, 2022 03:43 View deployment

Fix CI

55f2209

vercel bot deployed to Preview December 12, 2022 04:10 View deployment

fix CI

7ef728c

vercel bot deployed to Preview December 12, 2022 04:15 View deployment

qgallouedec reviewed Dec 12, 2022

View reviewed changes

cleanrl/ppo_continuous_action.py Outdated Show resolved Hide resolved

remote unused seed

d2b0a79

vercel bot deployed to Preview December 12, 2022 20:55 View deployment

vwxyzjn mentioned this pull request Dec 13, 2022

Huggingface Integration #292

Merged

20 tasks

dosssman approved these changes Dec 13, 2022

View reviewed changes

vwxyzjn approved these changes Dec 13, 2022

View reviewed changes

vwxyzjn merged commit b558b2b into master Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Gymnasium-compliant PPO script #320

Implement Gymnasium-compliant PPO script #320

dtch1997 commented Nov 15, 2022 •

edited by vwxyzjn

vercel bot commented Nov 15, 2022 •

edited

vwxyzjn commented Nov 15, 2022

dtch1997 commented Nov 20, 2022

vwxyzjn commented Nov 20, 2022

vwxyzjn commented Nov 21, 2022

vwxyzjn commented Nov 21, 2022

nidhishs commented Nov 24, 2022

dtch1997 commented Nov 28, 2022

vwxyzjn commented Dec 12, 2022

dosssman left a comment

vwxyzjn commented Dec 13, 2022

vwxyzjn left a comment

Implement Gymnasium-compliant PPO script #320

Implement Gymnasium-compliant PPO script #320

Conversation

dtch1997 commented Nov 15, 2022 • edited by vwxyzjn

Description

Types of changes

Checklist:

vercel bot commented Nov 15, 2022 • edited

vwxyzjn commented Nov 15, 2022

dtch1997 commented Nov 20, 2022

vwxyzjn commented Nov 20, 2022

vwxyzjn commented Nov 21, 2022

vwxyzjn commented Nov 21, 2022

nidhishs commented Nov 24, 2022

dtch1997 commented Nov 28, 2022

vwxyzjn commented Dec 12, 2022

dosssman left a comment

Choose a reason for hiding this comment

vwxyzjn commented Dec 13, 2022

vwxyzjn left a comment

Choose a reason for hiding this comment

dtch1997 commented Nov 15, 2022 •

edited by vwxyzjn

vercel bot commented Nov 15, 2022 •

edited