Thursday, July 12, 2012

TD-Gammon

Tesauro, G. (1994) TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation 6, 215-219.

http://www.research.ibm.com/massive/tdl.html

Tesauro does some interesting work in teaching a neural network how to play backgammon. The basis is that he uses a multilayer perceptron network in conjunction with reinforcement learning. The idea is pretty simple. The neural network is learning a value function - it translates a board state into a prediction of reward. At the end of the game the network is given a reward, and through practice can learn which board states are good and which are bad.

The reason I'm thinking about the RL literature is that making a spiking neural network learn a value function would be extremely useful and something that we could work on. This was actually one of my projects for my first rotation. The neural network part is basically to just learn the value function, and the temporal difference algorithm is put around it. I was trying to mimic the TD-gammon idea, but with spiking neurons that are recurrently connected. And instead of backgammon my neural network was playing tic-tac-toe. TTT is still quite non-linear, so it is interesting, and it's nice because we know that there is an optimal solution. Once the neural network converges on the optimal solution we can say that it has learned and is done.

I think it would be worthwhile to make something similar. The main addition would be to have a gamma clock. There were two main problems with the network model - it became unstable quite easily (either all the weights would explode and it would go into seizure, or all the weights would dwindle to 0), and there was no synchronizing (so spike timings would drift over time, and the info spike-timing carried would be lost).

No comments:

Post a Comment