Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
requirements.txt		requirements.txt
rl_benchmark.py		rl_benchmark.py
test_rl.py		test_rl.py

README.md

Graphcore benchmarks: Reinforcement Learning

This readme describes how to train a deep reinforcement learning model on multiple IPUs with synchronous data parallel training using synthetic data.

Overview

The general goal with Reinforcement Learning (RL) is to maximise some long-term reward by mapping observations and measurements to a set of actions. This usually involves an agent of some kind learning an optimal sequence of decisions. Useful applications of RL are therefore in areas where automated sequential decision-making is required.

Deep reinforcement learning combines the strength of deep neural networks (learning useful features from observations) with the machine learning paradigm of learning from trial and error. Graphcore has run a deep reinforcement learning model (policy gradient model) on multiple IPUs with synchronous data parallel training using synthetic data.

The model contains the following layers which are typically found in a policy network:

Embedding layers for representing discrete observations
Clipping layers to clip the values of continuous observations
Concatenating layers to group observations
Fully-connected transformations
Layers for choosing maximum feature value along a specific dimension
LSTM layer to process a sequence of observations
Final softmax layer of size = num_actions

Dataset

The function env_generator simulates discrete and continuous observations along with simulated rewards under the current policy.

Running the model

The following files are included in this repo:

File	Description
`README.md`	How to run the model
`rl_benchmark.py`	The main training program

Quick start guide

Prepare the TensorFlow environment. Install the Poplar SDK following the the instructions in the Getting Started guide for your IPU system. Make sure to run the enable.sh script and activate a Python virtualenv with the tensorflow-1 wheel from the Poplar SDK installed.
Run the training program. python3 rl_benchmark.py --batch_size 8 --time_steps 16 --num_ipus 8

Use --help to show all available options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinforcement_learning

reinforcement_learning

README.md

README.md

requirements.txt

requirements.txt

rl_benchmark.py

rl_benchmark.py

test_rl.py

test_rl.py

README.md

Graphcore benchmarks: Reinforcement Learning

Overview

Dataset

Running the model

Quick start guide

Files

reinforcement_learning

Directory actions

More options

Directory actions

More options

Latest commit

History

reinforcement_learning

Folders and files

parent directory

Graphcore benchmarks: Reinforcement Learning

Overview

Dataset

Running the model

Quick start guide