Practice Double Deep Q-Learning with Dueling Network Architecture for Breakout

This is fork from fg91/Deep-Q-Learning. I modified a little bit of the original code and trained it for Breakout. Maximum evaluation score was 804 (GIF shown above).

Modification from the original code

Modify ReplayMemory to sample from all possible index.
Clip rewards ("fixed all positive rewards to be 1 and all negative rewards to be -1, leaving 0 rewards unchanged") as Mnih et al. 2013 and Mnih et al. 2015. (This made convergence speed of training significantly faster and improved the performance of the agent in case of Breakout.)
Record evaluation score appropriately even if no evaluation game finished. (Unfinished evaluation game can happen when "the agent got stuck in a loop".)

Training Result

Requirements

tensorflow-gpu
gym
gym[atari] (make sure it is version 0.10.5 or higher/has BreakoutDeterministic-v4)
imageio
scikit-image

Try it yourself:

If you want to test the trained network (which achieves score of 804), simply run the notebook DQN.ipynb.

If you want to train the network yourself, set TRAIN = True in the first cell of DQN.ipynb and run the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
pictures		pictures
trained/breakout_804_points		trained/breakout_804_points
.gitignore		.gitignore
DQN.ipynb		DQN.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pictures

pictures

trained/breakout_804_points

trained/breakout_804_points

.gitignore

.gitignore

DQN.ipynb

DQN.ipynb

README.md

README.md

Repository files navigation

Practice Double Deep Q-Learning with Dueling Network Architecture for Breakout

Modification from the original code

Training Result

Requirements

Try it yourself:

About

Releases

Packages

Languages

gorisanson/Deep-Q-Learning

Folders and files

Latest commit

History

Repository files navigation

Practice Double Deep Q-Learning with Dueling Network Architecture for Breakout

Modification from the original code

Training Result

Requirements

Try it yourself:

About

Resources

Stars

Watchers

Forks

Languages