Add PPO + Transformer-XL #459

MarcoMeter · 2024-04-22T08:06:19Z

Description

Implementation of PPO with Transformer-XL as episodic memory.
Based on this repo and paper.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

…steps

vercel · 2024-04-22T08:06:23Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 27, 2024 7:04am

MarcoMeter · 2024-04-22T08:44:40Z

pre-commit

pre-commit fails because of two "obsolet" imports: memory_gym and PoMEnv. Without those imports, the environments are not registered inside gymnasium.

enjoy.py

I added a script to load a trained model and then watch an episode.

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

These environments require memory and converge pretty fast. That's why I included those initially. MemoryGym environments take in more time and resources (especially GPU memory due to the cached hidden states of Transformer-XL).

TODO

I still have to run the benchmarks and write documentation. Besides that, the single file implementation is basically done. I tried to stay close to ppo_atari_lstm.py

…initial to final value.

…nused imports, however these imports are necessary for the used environments to be registered

MarcoMeter added 12 commits April 15, 2024 08:58

initial commit of ppo trxl

990238e

removed video capture

513d2f8

Switched from grayscale to RGB

cd596c8

RGB obs reconstruction

cab8e07

print reconstruction loss instead of total loss

2a012eb

Ensure that transformer memory length is not larger than max episode …

2efe26a

…steps

fixed enjoy.py in the case of Searing Spotlights

8737485

added video capture support again after updating memory gym

5ef2e07

default hyperparameters

4f88baa

remove unnecessary padding from TrXL memory, if applicable

9974f20

print SPS

7761993

updated pyproject.toml because of memory-gym 1.0.2

5374027

vercel bot deployed to Preview April 22, 2024 08:06 View deployment

fixed comment

fe85e75

vercel bot deployed to Preview April 22, 2024 08:41 View deployment

MarcoMeter mentioned this pull request Apr 24, 2024

Any usage of poetry after installation: No module named 'tomli' #455

Open

3 tasks

refactored code + added a comment

eb36a32

vercel bot deployed to Preview April 24, 2024 12:46 View deployment

added annealing entropy coefficient. learning rate anneals also from …

2fe62d4

…initial to final value.

vercel bot deployed to Preview April 25, 2024 05:53 View deployment

aligned monitoring of losses and further metrics

86e59ef

vercel bot deployed to Preview May 27, 2024 06:59 View deployment

slight adjustments due to pre-commit, pre-commit still fails due to u…

62fea9d

…nused imports, however these imports are necessary for the used environments to be registered

vercel bot deployed to Preview May 27, 2024 07:04 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PPO + Transformer-XL #459

Add PPO + Transformer-XL #459

MarcoMeter commented Apr 22, 2024 •

edited

vercel bot commented Apr 22, 2024 •

edited

MarcoMeter commented Apr 22, 2024

Add PPO + Transformer-XL #459

Are you sure you want to change the base?

Add PPO + Transformer-XL #459

Conversation

MarcoMeter commented Apr 22, 2024 • edited

Description

Types of changes

Checklist:

vercel bot commented Apr 22, 2024 • edited

MarcoMeter commented Apr 22, 2024

pre-commit

enjoy.py

ProofofMemory-v0 and MiniGrid-MemoryS9-v0

TODO

MarcoMeter commented Apr 22, 2024 •

edited

vercel bot commented Apr 22, 2024 •

edited