Schultz, W. Dayan, P. Montague, PR. (1997) A Neural Substrate of Prediction and Reward. Science 276, 1593.
Reinforcement learning is one of the primary ways that the brain interacts with its environment to learn how to behave. There is plenty of theoretical work in reinforcement learning. One case is called temporal difference learning - where a value function is learned by translating reward signals backwards in time. Initially you are randomly making state changes (choices) and you stumble upon a reward. This reward signals to reinforce the previous states, making it more likely for you to enter the same states. This can be used to learn a variety of value functions.
The key to temporal difference learning is a signal that reports an error in the prediction of reward.
Dopamine neurons in the VTA have been shown to have the properties consistent with a temporal difference learning signal.
No comments:
Post a Comment