Practice Double Deep Q-Learning with Dueling Network Architecture for Breakout

This is fork from fg91/Deep-Q-Learning. I modified a little bit of the original code and trained it for Breakout. Maximum evaluation score was 804 (GIF shown above).

Modification from the original code

Modify ReplayMemory to sample from all possible index.
Clip rewards ("fixed all positive rewards to be 1 and all negative rewards to be -1, leaving 0 rewards unchanged") as Mnih et al. 2013 and Mnih et al. 2015. (This made convergence speed of training significantly faster and improved the performance of the agent in case of Breakout.)
Record evaluation score appropriately even if no evaluation game finished. (Unfinished evaluation game can happen when "the agent got stuck in a loop".)

Training Result

Requirements

tensorflow-gpu
gym
gym[atari] (make sure it is version 0.10.5 or higher/has BreakoutDeterministic-v4)
imageio
scikit-image

Try it yourself:

If you want to test the trained network (which achieves score of 804), simply run the notebook DQN.ipynb.

If you want to train the network yourself, set TRAIN = True in the first cell of DQN.ipynb and run the notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Practice Double Deep Q-Learning with Dueling Network Architecture for Breakout

Modification from the original code

Training Result

Requirements

Try it yourself:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Practice Double Deep Q-Learning with Dueling Network Architecture for Breakout

Modification from the original code

Training Result

Requirements

Try it yourself: