Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Gymnasium-compliant PPO script #320

Merged
merged 43 commits into from Dec 13, 2022
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
be6da79
Add Gymnasium and dependencies
dtch1997 Nov 15, 2022
d098870
Implement Gymnasium-compliant PPO script
dtch1997 Nov 15, 2022
aa9ac0c
Ensure pre-commit passes
dtch1997 Nov 15, 2022
cb08637
Fix CI, add a `gymnasium_support` folder
vwxyzjn Nov 15, 2022
5818d7d
update lock files
vwxyzjn Nov 15, 2022
51a7128
add dependencies
vwxyzjn Nov 15, 2022
363b48e
update requirements.txt; fix pre-commit
vwxyzjn Nov 15, 2022
e3174ca
update poetry files
vwxyzjn Nov 15, 2022
8da5f7b
Support dm control action spaces
vwxyzjn Nov 21, 2022
0544fcb
add dm_control support
vwxyzjn Nov 21, 2022
d30f3cf
Enable num_envs>1
dtch1997 Nov 28, 2022
99f7789
Enable auto-install of torch based on CUDA version
dtch1997 Nov 28, 2022
cbd83f6
Fix pre-commit
dtch1997 Nov 28, 2022
8cf18e3
bump torch version
vwxyzjn Dec 10, 2022
fe81b99
bump wandb version
vwxyzjn Dec 10, 2022
dd80937
change key for mujoco_py installation
vwxyzjn Dec 10, 2022
c46f700
update CI
vwxyzjn Dec 10, 2022
0d3a5e1
update docs
vwxyzjn Dec 10, 2022
fa73a60
Merge branch 'gymnasium_ppo' of https://github.com/vwxyzjn/cleanrl in…
vwxyzjn Dec 10, 2022
0381d7a
downgrade torch
vwxyzjn Dec 11, 2022
6582fab
update docs
vwxyzjn Dec 11, 2022
b3f19fd
update teset cases
vwxyzjn Dec 11, 2022
1e904a3
set default env = HalfCheetah-v4
vwxyzjn Dec 11, 2022
3cd9917
directly replace `ppo_continuous_action.py`
vwxyzjn Dec 11, 2022
08f9744
deprecate pybullet dependency in ppo
vwxyzjn Dec 11, 2022
b81d207
remove pybullet test case
vwxyzjn Dec 11, 2022
de3f410
support video recording to wandb
vwxyzjn Dec 11, 2022
1b01a4f
update docs
vwxyzjn Dec 11, 2022
73c0caf
update depdency for test cases
vwxyzjn Dec 11, 2022
3df239a
update test cases and add dm_control tests
vwxyzjn Dec 11, 2022
8a9a467
update docs
vwxyzjn Dec 11, 2022
b56fe05
update mkdocs base
vwxyzjn Dec 11, 2022
7660199
revert doc changes
vwxyzjn Dec 11, 2022
8fd5657
fix dm_control test cases
vwxyzjn Dec 11, 2022
a140a3e
quick docs
vwxyzjn Dec 11, 2022
708cfad
fix tests on CI
vwxyzjn Dec 11, 2022
6a41003
fix test case
vwxyzjn Dec 11, 2022
3efbc24
fix CI
vwxyzjn Dec 12, 2022
0e8df6f
Fix CI
vwxyzjn Dec 12, 2022
19d08d5
update mujoco dependency
vwxyzjn Dec 12, 2022
55f2209
Fix CI
vwxyzjn Dec 12, 2022
7ef728c
fix CI
vwxyzjn Dec 12, 2022
d2b0a79
remote unused seed
vwxyzjn Dec 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/pre-commit.yml
@@ -1,10 +1,7 @@
name: pre-commit

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
push
jobs:
build:
runs-on: ubuntu-latest
Expand Down
79 changes: 62 additions & 17 deletions .github/workflows/tests.yaml
Expand Up @@ -5,11 +5,6 @@ on:
- '**/README.md'
- 'docs/**/*'
- 'cloud/**/*'
pull_request:
paths-ignore:
- '**/README.md'
- 'docs/**/*'
- 'cloud/**/*'
jobs:
test-core-envs:
strategy:
Expand Down Expand Up @@ -133,7 +128,6 @@ jobs:
- name: Run pybullet tests
run: poetry run pytest tests/test_procgen.py


test-mujoco-envs:
strategy:
fail-fast: false
Expand All @@ -153,25 +147,76 @@ jobs:
poetry-version: ${{ matrix.poetry-version }}

# mujoco tests
- name: Install core dependencies
run: poetry install --with pytest
- name: Install pybullet dependencies
run: poetry install --with pybullet
- name: Install mujoco dependencies
run: poetry install --with mujoco
- name: Install jax dependencies
run: poetry install --with jax
- name: Install dependencies
run: poetry install --with pytest,mujoco,dm_control
- name: Downgrade setuptools
run: poetry run pip install setuptools==59.5.0
- name: install mujoco dependencies
run: |
sudo apt-get update && sudo apt-get -y install libgl1-mesa-glx libosmesa6 libglfw3
- name: Run mujoco tests
continue-on-error: true # MUJOCO_GL=osmesa results in `free(): invalid pointer`
run: poetry run pytest tests/test_mujoco.py

test-mujoco-envs-windows-mac:
strategy:
fail-fast: false
matrix:
python-version: [3.8]
poetry-version: [1.2]
os: [macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Run image
uses: abatilo/actions-poetry@v2.0.0
with:
poetry-version: ${{ matrix.poetry-version }}

# mujoco tests
- name: Install dependencies
run: poetry install --with pytest,mujoco,dm_control
- name: Downgrade setuptools
run: poetry run pip install setuptools==59.5.0
- name: Run mujoco tests
run: poetry run pytest tests/test_mujoco.py


test-mujoco_py-envs:
strategy:
fail-fast: false
matrix:
python-version: [3.8]
poetry-version: [1.2]
os: [ubuntu-22.04]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Run image
uses: abatilo/actions-poetry@v2.0.0
with:
poetry-version: ${{ matrix.poetry-version }}

# mujoco_py tests
- name: Install dependencies
run: poetry install --with pytest,pybullet,mujoco_py,mujoco,jax
- name: Downgrade setuptools
run: poetry run pip install setuptools==59.5.0
- name: install mujoco_py dependencies
run: |
sudo apt-get update && sudo apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libosmesa6-dev patchelf
- name: Run mujoco tests
run: poetry run pytest tests/test_mujoco.py
- name: Run mujoco_py tests
run: poetry run pytest tests/test_mujoco_py.py

test-envpool-envs:
strategy:
Expand Down Expand Up @@ -251,4 +296,4 @@ jobs:
- name: Install ROMs
run: poetry run AutoROM --accept-license
- name: Run pettingzoo tests
run: poetry run pytest tests/test_pettingzoo_ma_atari.py
run: poetry run pytest tests/test_pettingzoo_ma_atari.py
2 changes: 1 addition & 1 deletion .gitpod.Dockerfile
Expand Up @@ -12,7 +12,7 @@ RUN mkdir cleanrl_utils && touch cleanrl_utils/__init__.py
RUN pip install poetry
RUN poetry config virtualenvs.in-project true

# install mujoco
# install mujoco_py
RUN sudo apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Expand Up @@ -53,8 +53,8 @@ repos:
args: ["--without-hashes", "-o", "requirements/requirements-pybullet.txt", "--with", "pybullet"]
stages: [manual]
- id: poetry-export
name: poetry-export requirements-mujoco.txt
args: ["--without-hashes", "-o", "requirements/requirements-mujoco.txt", "--with", "mujoco"]
name: poetry-export requirements-mujoco_py.txt
args: ["--without-hashes", "-o", "requirements/requirements-mujoco_py.txt", "--with", "mujoco_py"]
stages: [manual]
- id: poetry-export
name: poetry-export requirements-procgen.txt
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Expand Up @@ -15,13 +15,13 @@ RUN poetry install
RUN poetry install --with atari
RUN poetry install --with pybullet

# install mujoco
# install mujoco_py
RUN apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libosmesa6-dev patchelf
RUN poetry install --with mujoco
RUN poetry install --with mujoco_py
RUN poetry run python -c "import mujoco_py"

COPY entrypoint.sh /usr/local/bin/
Expand Down
1 change: 0 additions & 1 deletion README.md
Expand Up @@ -113,7 +113,6 @@ You may also use a prebuilt development environment hosted in Gitpod:

## Algorithms Implemented

# Overview

| Algorithm | Variants Implemented |
| ----------- | ----------- |
Expand Down
4 changes: 2 additions & 2 deletions benchmark/ddpg.sh
@@ -1,12 +1,12 @@
poetry install --with mujoco,pybullet
poetry install --with mujoco_py,pybullet
python -c "import mujoco_py"
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/ddpg_continuous_action.py --track --capture-video" \
--num-seeds 3 \
--workers 1

poetry install --with mujoco,jax
poetry install --with mujoco_py,jax
poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
poetry run python -c "import mujoco_py"
xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
Expand Down
22 changes: 19 additions & 3 deletions benchmark/ppo.sh
Expand Up @@ -28,13 +28,13 @@ xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--num-seeds 3 \
--workers 1

poetry install --with mujoco,pybullet
poetry install --with mujoco_py,mujoco
poetry run python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/ppo_continuous_action.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 9
--workers 6

poetry install --with procgen
xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
Expand Down Expand Up @@ -89,3 +89,19 @@ poetry run python -m cleanrl_utils.benchmark \
--command "poetry run python ppo_atari_envpool_xla_jax.py --track --wandb-project-name envpool-atari --wandb-entity openrlbenchmark" \
--num-seeds 3 \
--workers 1

# gymnasium support
poetry install --with mujoco
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
--command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
--num-seeds 3 \
--workers 1

poetry install --with dm_control,mujoco
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0 dm_control/ball_in_cup-catch-v0 dm_control/cartpole-balance-v0 dm_control/cartpole-balance_sparse-v0 dm_control/cartpole-swingup-v0 dm_control/cartpole-swingup_sparse-v0 dm_control/cartpole-two_poles-v0 dm_control/cartpole-three_poles-v0 dm_control/cheetah-run-v0 dm_control/dog-stand-v0 dm_control/dog-walk-v0 dm_control/dog-trot-v0 dm_control/dog-run-v0 dm_control/dog-fetch-v0 dm_control/finger-spin-v0 dm_control/finger-turn_easy-v0 dm_control/finger-turn_hard-v0 dm_control/fish-upright-v0 dm_control/fish-swim-v0 dm_control/hopper-stand-v0 dm_control/hopper-hop-v0 dm_control/humanoid-stand-v0 dm_control/humanoid-walk-v0 dm_control/humanoid-run-v0 dm_control/humanoid-run_pure_state-v0 dm_control/humanoid_CMU-stand-v0 dm_control/humanoid_CMU-run-v0 dm_control/lqr-lqr_2_1-v0 dm_control/lqr-lqr_6_2-v0 dm_control/manipulator-bring_ball-v0 dm_control/manipulator-bring_peg-v0 dm_control/manipulator-insert_ball-v0 dm_control/manipulator-insert_peg-v0 dm_control/pendulum-swingup-v0 dm_control/point_mass-easy-v0 dm_control/point_mass-hard-v0 dm_control/quadruped-walk-v0 dm_control/quadruped-run-v0 dm_control/quadruped-escape-v0 dm_control/quadruped-fetch-v0 dm_control/reacher-easy-v0 dm_control/reacher-hard-v0 dm_control/stacker-stack_2-v0 dm_control/stacker-stack_4-v0 dm_control/swimmer-swimmer6-v0 dm_control/swimmer-swimmer15-v0 dm_control/walker-stand-v0 dm_control/walker-walk-v0 dm_control/walker-run-v0 \
--command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
--num-seeds 3 \
--workers 9

2 changes: 1 addition & 1 deletion benchmark/sac.sh
@@ -1,4 +1,4 @@
poetry install --with mujoco,pybullet
poetry install --with mujoco_py,pybullet
poetry run python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
Expand Down
4 changes: 2 additions & 2 deletions benchmark/td3.sh
@@ -1,12 +1,12 @@
poetry install --with mujoco,pybullet
poetry install --with mujoco_py,pybullet
python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/td3_continuous_action.py --track --capture-video" \
--num-seeds 3 \
--workers 1

poetry install --with mujoco,jax
poetry install --with mujoco_py,jax
poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
poetry run python -c "import mujoco_py"
xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
Expand Down
46 changes: 30 additions & 16 deletions cleanrl/ppo_continuous_action.py
Expand Up @@ -5,9 +5,8 @@
import time
from distutils.util import strtobool

import gym
import gymnasium as gym
import numpy as np
import pybullet_envs # noqa
import torch
import torch.nn as nn
import torch.optim as optim
Expand Down Expand Up @@ -36,7 +35,7 @@ def parse_args():
help="whether to capture videos of the agent performances (check out `videos` folder)")

# Algorithm specific arguments
parser.add_argument("--env-id", type=str, default="HalfCheetahBulletEnv-v0",
parser.add_argument("--env-id", type=str, default="HalfCheetah-v4",
help="the id of the environment")
parser.add_argument("--total-timesteps", type=int, default=1000000,
help="total timesteps of the experiments")
Expand Down Expand Up @@ -79,7 +78,11 @@ def parse_args():

def make_env(env_id, seed, idx, capture_video, run_name, gamma):
vwxyzjn marked this conversation as resolved.
Show resolved Hide resolved
def thunk():
env = gym.make(env_id)
if capture_video:
env = gym.make(env_id, render_mode="rgb_array")
else:
env = gym.make(env_id)
env = gym.wrappers.FlattenObservation(env) # deal with dm_control's Dict observation space
env = gym.wrappers.RecordEpisodeStatistics(env)
if capture_video:
if idx == 0:
Expand All @@ -89,9 +92,6 @@ def thunk():
env = gym.wrappers.TransformObservation(env, lambda obs: np.clip(obs, -10, 10))
env = gym.wrappers.NormalizeReward(env, gamma=gamma)
env = gym.wrappers.TransformReward(env, lambda reward: np.clip(reward, -10, 10))
env.seed(seed)
env.action_space.seed(seed)
env.observation_space.seed(seed)
return env

return thunk
Expand Down Expand Up @@ -147,7 +147,7 @@ def get_action_and_value(self, x, action=None):
sync_tensorboard=True,
config=vars(args),
name=run_name,
monitor_gym=True,
# monitor_gym=True, no longer works for gymnasium
save_code=True,
)
writer = SummaryWriter(f"runs/{run_name}")
Expand Down Expand Up @@ -184,9 +184,11 @@ def get_action_and_value(self, x, action=None):
# TRY NOT TO MODIFY: start the game
global_step = 0
start_time = time.time()
next_obs = torch.Tensor(envs.reset()).to(device)
next_obs, _ = envs.reset(seed=args.seed)
next_obs = torch.Tensor(next_obs).to(device)
next_done = torch.zeros(args.num_envs).to(device)
num_updates = args.total_timesteps // args.batch_size
video_filenames = set()

for update in range(1, num_updates + 1):
# Annealing the rate if instructed to do so.
Expand All @@ -208,16 +210,22 @@ def get_action_and_value(self, x, action=None):
logprobs[step] = logprob

# TRY NOT TO MODIFY: execute the game and log data.
next_obs, reward, done, info = envs.step(action.cpu().numpy())
next_obs, reward, terminated, truncated, infos = envs.step(action.cpu().numpy())
done = np.logical_or(terminated, truncated)
rewards[step] = torch.tensor(reward).to(device).view(-1)
next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)

for item in info:
if "episode" in item.keys():
print(f"global_step={global_step}, episodic_return={item['episode']['r']}")
writer.add_scalar("charts/episodic_return", item["episode"]["r"], global_step)
writer.add_scalar("charts/episodic_length", item["episode"]["l"], global_step)
break
# Only print when at least 1 env is done
if "final_info" not in infos:
continue

for info in infos["final_info"]:
# Skip the envs that are not done
if info is None:
continue
print(f"global_step={global_step}, episodic_return={info['episode']['r']}")
writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step)
writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)

# bootstrap value if not done
with torch.no_grad():
Expand Down Expand Up @@ -314,5 +322,11 @@ def get_action_and_value(self, x, action=None):
print("SPS:", int(global_step / (time.time() - start_time)))
writer.add_scalar("charts/SPS", int(global_step / (time.time() - start_time)), global_step)

if args.track and args.capture_video:
for filename in os.listdir(f"videos/{run_name}"):
if filename not in video_filenames and filename.endswith(".mp4"):
wandb.log({f"videos": wandb.Video(f"videos/{run_name}/{filename}")})
video_filenames.add(filename)

envs.close()
writer.close()
Expand Up @@ -24,4 +24,4 @@ ninja = "^1.10.2"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
build-backend = "poetry.core.masonry.api"
4 changes: 2 additions & 2 deletions docs/get-started/installation.md
Expand Up @@ -76,7 +76,7 @@ You can install them using the following command
```bash
poetry install --with atari
poetry install --with pybullet
poetry install --with mujoco
poetry install --with mujoco_py
poetry install --with procgen
poetry install --with envpool
poetry install --with pettingzoo
Expand All @@ -94,7 +94,7 @@ While we recommend using `poetry` to manage environments and dependencies, the t
pip install -r requirements/requirements.txt
pip install -r requirements/requirements-atari.txt
pip install -r requirements/requirements-pybullet.txt
pip install -r requirements/requirements-mujoco.txt
pip install -r requirements/requirements-mujoco_py.txt
pip install -r requirements/requirements-procgen.txt
pip install -r requirements/requirements-envpool.txt
pip install -r requirements/requirements-pettingzoo.txt
Expand Down
6 changes: 3 additions & 3 deletions docs/rl-algorithms/ddpg.md
Expand Up @@ -41,7 +41,7 @@ poetry install
poetry install --with pybullet
python cleanrl/ddpg_continuous_action.py --help
python cleanrl/ddpg_continuous_action.py --env-id HopperBulletEnv-v0
poetry install --with mujoco # only works in Linux
poetry install --with mujoco_py # only works in Linux
python cleanrl/ddpg_continuous_action.py --env-id Hopper-v3
```

Expand Down Expand Up @@ -262,11 +262,11 @@ The [ddpg_continuous_action_jax.py](https://github.com/vwxyzjn/cleanrl/blob/mast
### Usage

```bash
poetry install --with mujoco,jax
poetry install --with mujoco_py,jax
poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
poetry run python -c "import mujoco_py"
python cleanrl/ddpg_continuous_action_jax.py --help
poetry install --with mujoco # only works in Linux
poetry install --with mujoco_py # only works in Linux
python cleanrl/ddpg_continuous_action_jax.py --env-id Hopper-v3
```

Expand Down