Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

This reporsitory provided implemented codes for the paper, Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization. The codes are implemented based on the original DRL methods for each task; see the references and original codes for details.

Installation

Clone project and create environment with conda:

conda create -n sym python==3.7
conda activate sym

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda install -c rdkit rdkit
pip install -r requirements.txt

Note

We highly recommend using Python 3.7, PyTorch 1.12.1, and Pytorch Geometric 1.7.2. Additionally, we use PyTDC 0.4.0 instead of 0.3.6, which is recommended in mol_opt.
If you use the different cuda version, please modify the url for torch-scatter and torch-sparse in requirements.txt before run it; see here.
If you have any problems to install torch-scatter and torch-sparse, try conda install pyg -c pyg.
We slightly modified the original codes of AM and Sym-NCO to make them runable in Python 3.7 according to here.

Usage

We have followed the original (base) source codes.

Euclidean CO

Symmetric Replay Training

TSP (base: AM)

cd attention-learn-to-route
python run.py --problem tsp --batch_size 100 --epoch_size 10000 --n_epochs 200 --graph_size 50 --val_dataset '../data/tsp/tsp50_val_seed1234.pkl' --baseline critic --distil_every 1 --il_coefficient 0.001

CVRP (base: Sym-NCO)

cd sym_nco
python run.py --problem cvrp --batch_size 100 --epoch_size 10000 --n_epochs 100 --graph_size 50 --val_dataset '../data/vrp/vrp50_val_seed1234.pkl' --N_aug 5 --il_coefficient 0.001 --distil_every 1 --run_name sym_rd

Baseline

AM

cd attention-learn-to-route
python run.py --problem tsp --batch_size 100 --epoch_size 10000 --n_epochs 200 --graph_size 50 --val_dataset '../data/tsp/tsp50_val_seed1234.pkl' --baseline rollout

POMO

cd pomo/TSP/POMO
python train_n50.py 
python train_n100.py

Sym-NCO

cd sym_nco
python run.py --problem cvrp --batch_size 100 --n_epochs 50 --graph_size 50 --val_dataset '../data/vrp/vrp50_val_seed1234.pkl'

Non-Euclidean CO

cd non_euclidean_co/mat_net/ATSP/ATSP_MatNet
python train.py

Please change the configuration USE_POMO as True in train.py to run the original MatNet (base DRL method).

Note: validation data can be downloaded in here.

MolOpt

Symmetric Replay Training

cd mol_opt
python run.py reinvent_selfies --task simple --oracle scaffold_hop --config_default 'hparams_symrd.yaml'

(Base) REINVENT-SELFIES

python run.py reinvent_selfies --task simple --oracle scaffold_hop

Orther baselines are runable by changing method to gflownet (GFlowNet), gflownet_al (GFlowNet-AL), and moldqn (MolDQN).

Acknowledgements

This work is done based on the following papers.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
euclidean_co		euclidean_co
hardware_opt		hardware_opt
mol_opt		mol_opt
non_euclidean_co/mat_net		non_euclidean_co/mat_net
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

euclidean_co

euclidean_co

hardware_opt

hardware_opt

mol_opt

mol_opt

non_euclidean_co/mat_net

non_euclidean_co/mat_net

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

Installation

Usage

Euclidean CO

Symmetric Replay Training

Baseline

Non-Euclidean CO

MolOpt

Acknowledgements

About

Releases

Packages

Languages

kaist-silab/symmetric_replay

Folders and files

Latest commit

History

Repository files navigation

Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

Installation

Usage

Euclidean CO

Symmetric Replay Training

Baseline

Non-Euclidean CO

MolOpt

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages