Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add complex observation atari ppo #359

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ttumiel
Copy link

@ttumiel ttumiel commented Feb 15, 2023

Description

Added handling of complex observations to atari_ppo.py. Closes #353

I also wrote a jax version for the #338 branch (I can put it in another PR when #338 is ready?) There are only 2 changes that use jax's tree_map.
https://gist.github.com/ttumiel/ee746d6292cecb47d390fb97c3ccfa5e

Tests

I wrote some tests for different observation types. Wasn't sure if these belonged in the test folder, since they kind of just demonstrate the functionality.

I also wrote a dummy complex observation wrapper to demonstrate handling a dict spact in atari: https://gist.github.com/ttumiel/c2132b424c49b76a62bafe7efef9923d

Speed

The tree.map_structure function is about 10us of overhead.

import gym, tree, numpy as np

o = gym.spaces.Box(0, 255, (64, 64))
x = [o.sample() for _ in range(10)]

%%timeit
o=np.stack(x)
# 15.8 µs ± 491 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
o=tree.map_structure(lambda *x: np.stack(x), *x)
# 26.5 µs ± 670 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I (surprisingly?) got a slight speed increase when running BreakoutNoFrameskip. Locally I got about 470 SPS with complex tree_map vs 460 SPS on the original.

Questions

  • Maybe I should put the complex obs in the ppo.py file directly, instead of a new file?
  • Should I add a note in the docs?

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm variant.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format).
    • I have added links to the tracked experiments.
    • I have updated the overview sections at the docs and the repo
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented Feb 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Feb 15, 2023 at 11:25PM (UTC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PPO Complex Obs/Action Space
1 participant