wildbg training process

This repository documents the training process of the neural nets for the backgammon engine wildbg.

Each folder contains rollout data and the neural net that was trained on that data. The first rollout happened with random moves. The second rollout used the first net for evaluating moves etc.

Data	Remarks
0001	Trained on rollouts with random moves
0002
0003	Use `tanh` instead of `sigmoid` for inner layer
0004
0005	Increased number of epochs from 10 to 20. Increased learning rate from 0.1 to 4.0
0006
0007	The 7th net is actually a bit worse than the 6th net. Less wins, less backgammon wins, but more gammon wins.
0008	No new rollouts were done. Instead this net was trained on the combined rollout data of iteration 6 and 7. The network topology has been changed from one hidden layer with `tanh` activation to three hidden layers with `ReLu` activation.
0009	Rollouts were done with the net from iteration 8. We now have different sets of data for contact and race positions, two different networks and also two different number of inputs. Combined they are better than the 8th iteration, but lose a lot of backgammons because the contact network is too optimistic. It will avoid going into a race and then loses backgammon instead.
0010	Rollouts were done with the nets from iteration 9. Only a contact network was rolled out and trained. The loss function for training was changed from MSELoss to L1Loss.
0011	No new rollouts; using the same training data as 0010. Instead of the PyTorch optimizer `SGD` here we used `Adam`. When duelling with the previous net, this results in an equity win of roughly 0.02.
0012	Increased number of epochs from 20 to 50. The race data was rolled out with race #9 and contact #11 (each most recent). The contact data was rolled out with race #12 and contact #12. For the contact net we switched from Adam to AdamW, seems to be a small improvement. Overall dramatic improvement over the previous nets, they now only lose 0.7% backgammon.
0013	No new rollouts. The contact net is using `Hardsigmoid` instead of `ReLU` as activation function. It seems to give slightly better results (equity win 0.01), but inference also takes a bit longer.
0014	No new rollouts. The contact net is trained on the same data as the two previous ones, just a few more epochs, a slightly smaller learning rate and more careful comparisons between different onnx files. Equity win of 0.01.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

LICENSE

LICENSE

README.md

README.md

Repository files navigation

wildbg training process

About

Releases

Packages

License

carsten-wenderdel/wildbg-training

Folders and files

Latest commit

History

Repository files navigation

wildbg training process

About

Resources

License

Stars

Watchers

Forks