Skip to content
/ rosmo Public

Codes for "Efficient Offline Policy Optimization with a Learned Model", ICLR2023

License

Notifications You must be signed in to change notification settings

sail-sg/rosmo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ROSMO


Check status License Arxiv

Table of Contents

Introduction

This repository contains the implementation of ROSMO, a Regularized One-Step Model-based algorithm for Offline-RL, introduced in our paper "Efficient Offline Policy Optimization with a Learned Model". We provide the training codes for both Atari and BSuite experiments, and have made the reproduced results on Atari MsPacman publicly available at W&B.

Installation

Please follow the installation guide.

Usage

BSuite

To run the BSuite experiments, please ensure you have downloaded the datasets and placed them at the directory defined by CONFIG.data_dir in experiment/bsuite/config.py.

  1. Debug run.
python experiment/bsuite/main.py -exp_id test -env cartpole
  1. Enable W&B logger and start training.
python experiment/bsuite/main.py -exp_id test -env cartpole -nodebug -use_wb -user ${WB_USER}

Atari

The following commands are examples to train 1) a ROSMO agent, 2) its sampling variant, and 3) a MZU agent on the game MsPacman.

  1. Train ROSMO with exact policy target.
python experiment/atari/main.py -exp_id rosmo -env MsPacman -nodebug -use_wb -user ${WB_USER}
  1. Train ROSMO with sampled policy target (N=4).
python experiment/atari/main.py -exp_id rosmo-sample-4 -sampling -env MsPacman -nodebug -use_wb -user ${WB_USER}
  1. Train MuZero unplugged for benchmark (N=20).
python experiment/atari/main.py -exp_id mzu-sample-20 -algo mzu -num_simulations 20 -env MsPacman -nodebug -use_wb -user ${WB_USER}

Citation

If you find this work useful for your research, please consider citing

@inproceedings{
  liu2023rosmo,
  title={Efficient Offline Policy Optimization with a Learned Model},
  author={Zichen Liu and Siyi Li and Wee Sun Lee and Shuicheng Yan and Zhongwen Xu},
  booktitle={International Conference on Learning Representations},
  year={2023},
  url={https://arxiv.org/abs/2210.05980}
}

License

ROSMO is distributed under the terms of the Apache2 license.

Acknowledgement

We thank the following projects which provide great references:

Disclaimer

This is not an official Sea Limited or Garena Online Private Limited product.