One insight from ART that I was struggling with conceptually is the idea of positive vs. negative feedback in forming a perception. I always imagined it like cortex would form the model of the pixels that are within thalamus. The feedback coming from cortex to thalamus is predicting the state of thalamus. It seemed reasonable to me that this feedback was negative - once some pixels were explained by a model of cortex, then those pixels would be subtracted off and cortex would be trying to figure out the pixels that are left over.
ART, however, is based on positive feedback. Adaptive resonance means that the top-down and bottom-up signals are in agreement, and then those pixels are amplified. This fits in anatomically, as I was having trouble figuring out how cortex could be doing negative feedback to thalamus in a specific way. The key to working with positive feedback as a neural computational mechanism, however, is the necessity of multiplicative gain control.
Without any inhibition, a positive feedback loop would obviously explode or die off to zero in a short amount of time. But one of the keys to ART is that there is divisive inhibition/constrast normalization/gain-control in the representation space. This leads to feedback systems that are controlled by additive levels of excitation, and multiplicative types of inhibition. The positive feedback may not even come across as excitatory. The positive feedback will enhance the signals that are matched by top-down and bottom-up, but the divisive inhibition will keep the population normalized. Thus enhancement of some neurons will lead to more inhibition in other neurons. And by making sure the inhibition is multiplicative, the information stored in the population code is not lost.
So in a way the feedback from cortex can seem negative. But really the negative part of the feedback is coming from feedforward inhibition and local feedback inhibition. Cortex excites further the neurons which it predicts correctly, and the population gets normalized by the inhibitory neurons.
So, how do we actually make something that is based on ART that can actually do what we want it to do? The problem with ART so far is that it isn't completely written out. It is a conceptual framework for fitting in neural circuits and solving some learning problems, but I feel it hasn't been fully fleshed out.
First, lets make some assumptions. ART is based on top-down excitation being modulatory. Based on all the evidence, much of which is talked about in this blog, is that top-down modulatory excitation is done via the apical tuft. This implies that pyramidal cells have two modes of sending signals - spiking and bursting. We see from the literature that the apical tree can generate a calcium spike when signals from layer 1 excite the tree enough. Top-down signals tend to come from layer 1. The calcium spike will cause a burst of spikes if the basal tree (bottom-up) is also excited. However, what ART requires is that the calcium spike does not actually cause any action-potentials if the bottom-up inputs are not present, which is fine biophysically.
The top-down and bottom-up types of excitation can set up learning rules for the system. The most simple way to think about it is that there are 4 binary combinations of possibilities. 1. No top-down or bottom-up input, 2. Bottom-up but no top-down, 3. Top-down but no bottom-up, 4. Both top-down and bottom-up. There are different outputs from a pyramidal cell under these conditions: 1. Nothing, 2. Spiking, 3. Nothing (but neuron is extra excitable). 4. Bursting.
Now, ART talks about learning based on 4, with no plasticity in the other 3 cases (4 is a resonance). However, there could be some plasticity based on cases 2 and 3, but that follows different rules - 3 for instance, could be an indication for LTD, as the top-down predictions were incorrect at classifying a bottom-up input.
One question that needs to be answered is what is the relative impact of bursting over spiking on the normalized network? If we imagine that the normalization process, at its most simple approximation, is to make the population vector distance 1, then what does a bursting neuron do in comparison to a spiking neuron? Is a burst twice the length of a spike? Three times? One? Does it even matter - can the network through learning be operational with different impacts of bursting and spiking?
No comments:
Post a Comment