Monday, December 17, 2012

Distinct functions for direct and transthalamic corticocortical connections

Sherman, SM. Guillery, RW. (2011) Distinct functions for direct and transthalamic corticocortical connnections. Journal of Neurophysiology 106: 1068-1077.

View that glutamatergic signaling is functionally uniform needs to change. Different classes of glutamate signaling.

Drivers and modulators: class1 and class 2 Glutamatergic Pathways. First seen in thalamus - distinction between retinal feedforward and cortical feedback. Drivers are larger initial excitatoion and show paired-pulse depression. Facilitators have smaller initial excitation and paired-pulse facilitation.


Table 1. Properties of class 1 and class 2 pathways

  • Class 1/Driver (e.g., Retinal)
    • Large EPSPs
    • Synapses show paired-pulse depression
    • Less convergence onto target
    • Dense terminal arbors (type 2)
    • Thick axons
    • Large terminals
    • Contacts target cell proximally
    • Activates only iGluRs


  • Class 2/Modulator (e.g., Layer 6)
    • Small EPSPs
    • Synapses show paired-pulse facilitation
    • More convergence onto target
    • Sparse terminal arbors (type 1)
    • Thin axons
    • Small terminals
    • Contacts target cell peripherally
    • Activates iGluRs and mGluRs




Fig. 1. Distinguishing driver (class 1) from modulator (class 2) inputs. A: light microscopic tracings of a driver (class 1) afferent (a retinogeniculate axon from the cat) and a modulator (class 2) afferent (a corticogeniculate axon from layer 6 of the cat). [Redrawn from Sherman and Guillery 2006.] B: modulators (red) shown contacting more peripheral dendrites than do drivers (green). Also, drivers activate only ionotropic glutamate receptors, whereas modulators also activate metabotropic glutamate receptors. C: effects of repetitive stimulation on excitatory postsynaptic potential (EPSP) amplitude: for modulators it produces paired-pulse facilitation (increasing EPSP amplitudes during the stimulus train), whereas for drivers it produces paired-pulse depression (decreasing EPSP amplitudes during the stimulus train). Also, increasing stimulus intensity for modulators (shown as different line styles) produces increasing EPSP amplitudes overall, whereas for drivers it does not; this indicates more convergence of modulator inputs compared with driver inputs.

Spikes and bursts in thalamus are caused by driving inputs from retina - (spikes in tonic mode, burst during burst mode). Cortex should interpret spikes as arising from retina.

Class 1 (drivers) and class 2 (modulators) are also seen throughout cortex. The major parameters of theses classes separate and stay clustered withing class. They stay clustered even across cortical and thalamic areas.

Class 2 can acton on metabotropic GluRs. These have longer time-scales and can also be inhibitory.

Higher-order thalamus gets its driving input from layer 5 of cortex. Class 2 input primarily comes from layer 6.

Fig. 3. Direct and transthalamic corticocortical pathways. Information relayed to cortex through thalamus is brought to thalamus via class 1 axons, most or all of which branch, with the extrathalamic branch innervating brain stem or spinal cord motor centers. This applies to inputs to both first-order (FO) and higher order (HO) thalamic relays. Thus the branches innervating thalamus (green) can be regarded as efference copies. The schematic diagram also shows the layer 6 class 2 feedback from each cortical area to thalamus, and this is contrasted with the layer 5 feed-forward corticothalamic pathways. Note that this shows cortical areas connected by 2 parallel paths: a direct one and a transthalamic one.

The class 1 inputs (both from retina and layer 5) have a common feature: they branch and project to brainstem as well as thalamus. Since thalamus is relay-like, these are efference copies. Every cortical area so far studied has a layer 5 projection to subcortical motor centers, many of which branch to thalamus.

Not much overlap in the parallel pathways - direct cortical pathways does not go subcortical, transthalamic pathway does not go to cortex.

Thalamus could be responding to unexpected motor instruction, blocking conflicting motor commands, or dynamically coupling different cortical areas. This is because of modulation by reticular nucleus and other modulatory signals.



Friday, December 14, 2012

The Brain Activity Map Project and the Challenge of Functional Connectomics

Alivisatos, AP. Chun, M. Church, GM. Greenspan, RJ. Roukes, ML. Yuste, R. (2012) The Brain Activity Map Project and the Challenge of Functional Connectomics. Neuron 74:970-974.

Brain Activity Map Project (BAM) is aimed at reconstructing the full record of neural activity across complete neural circuits. The general idea is to crack the emergent properties of the neural circuits. I like their opening quote:


‘‘The behavior of large and complex
aggregates of elementary
particles, it turns out, is not to be
understood in terms of a simple
extrapolation of the properties of a
few particles. Instead, at each level
of complexity entirely new properties
appear.’’ –More Is Different,
P.W. Anderson

Record every action potential from every neuron within a circuit. Need better voltage dyes, better multi-electrode recordings (3-dimensional probes), use wireless electrodes. 

So just a call for this progress. I was thinking that this applies nicely to my voltage-dye studies of the leech. We are probably closer than anyone to actually imaging the activity of an entire neural circuit during behaviorally relevant states. This type of mapping project fits nicely into how I'm thinking of telling the VSD story for my thesis. 

Bayesian inference with probabilistic population codes

Ma, WJ. Beck, JM. Latham, PE. Pouget, A. (2006) Bayesian inference with probabilistic population codes. Nature Neuroscience 9(11): 1432-1438.

To get Bayes optimal performance, neuron's must be doing computation that is a close approximation to Bayes' rule. Neuronal variability implies that populations of neurons automatically represent probability distributions over the stimulus - a code called "probabilistic population codes".

Any paper that mentions death by piranha in the first paragraph has got to be good.

Poisson-like variability seen in neuronal responses allows neurons to represent probability distributions in a format that reduces optimal Bayesian inference to simple linear combinations of neural activities.

Equations 2 and 3 describe how to combine two gaussian distributions (i.e. sensory integration) optimally according to Bayes. This is their definition of optimal:
So the gain of the population code reflects the variance of the distribution. Simply adding two neural distributions can lead to optimal bayesian inference.

Figure 2 Inference with probabilistic population codes for Gaussian
probability distributions and Poisson variability. The left plots correspond
to population codes for two cues, c1 and c2, related to the same variable s.
Each of these encodes a probability distribution with a variance inversely
proportional to the gains, g1 and g2, of the population codes (K is a constant
depending on the width of the tuning curve and the number of neurons).
Adding these two population codes leads to the output population activity
shown on the right. This output also encodes a probability distribution with a
variance inversely proportional to the gain. Because the gain of this code is
g1 + g2, and g1 and g2 are inversely proportional to s12 and s22, respectively,
the inverse variance of the output population code is the sum of the inverse
variances associated with c1 and c2. This is precisely the variance expected
from an optimal Bayesian inference (equation (3)). In other words, taking the
sum of two population codes is equivalent to taking the product of their
encoded distributions.

They derive generalizations  of this - i.e. tuning curves and distributions that are not gaussian. Essentially optimality can be obtained even if the neurons are not independent, or if their receptive fields are not of the same form (i.e. some gaussian some sigmoidal). The covariance matrix of the neural repsonses must be proportional to the gain.

Can also incorporate a prior distribution that is not flat.

They do a simulation of integrate-and-fire neurons that is similary to figure 2, and show that it works.

The population code not only reflects the value, but also the uncertainty - based on the gain of the population.

Need divisive normalization to prevent saturation.


Pretty cool stuff. Another example of why divisive normalization is an essential computation for the brain. I also like how they create seperate populations that represent their distributions and then are combined in a higher-level population.

Wednesday, December 12, 2012

Normalization for probabilistic inference with neurons

Eliasmith, C. Martens, J. (2011) Normalization for probabilistic inference with neurons. Biological Cybernetics 104:251-262.

A solution that maintains a probability density for inference that does not depend on division.



"the NEF approach:

1. Neural representations are defined by the combination of
nonlinear encoding (exemplified by neuron tuning curves,
and neural spiking) and weighted linear decoding (over
populations of neurons and over time).
2. Transformations of neural representations are functions
of the variables represented by neural populations. Trans-
formations are determined using an alternately weighted
linear decoding.
3. Neural dynamics are characterized by considering neural
representations as control theoretic state variables. Thus,
the dynamics of neurobiological systems can be analyzed
using control theory."

So he shows some math that converts between vector spaces and function spaces and shows how these can be considered equivalent. Basically you are parameterizing the function with a vector representation.

He derives a bias function that is supposed to compensate for the errors in the integral (the integral is supposed to be 1 for probabilities). It captures distortions of the representation from projecting it to the neuron-like encoding. Basically the bias gets factored into the connection strengths, and can account for the non-linearities. (Not so much what I thought this was going to be about).

Any this looks like an interesting paper for the gain control stuff: Ma et al. 2006

Tuesday, December 11, 2012

Eliasmith 2012 Supplemental II

The working memory hierarchy is based on recurrent attractor neural networks. This can store semantic pointers.

Circular convolution to perform compression - binding two vectors together - bind the current semantic pointer vector with a position. The position is an internally generated position index semantic pointer:
MemoryTrace = Position1 ⊗ Item1 + Position2 ⊗ Item2 ...  
Position1 = Base
Position2 = Position1 ⊗ Base  
Position3 = Position2 ⊗ Base  

Conceptual semantic ponters for numbers are constructed similarly to positoin - circular convolution of the base and the operator addone:
One = Base
Two = One ⊗ AddOne  
Three = Two ⊗ AddOne
The vectors are unitary - don't change length when convolved with themselves.

Reward evaluation is done via a dopamine like RL system. 

Neurons are LIF. 20ms time constant, abs refractory of 2ms. random max firing rates between 100-200Hz. Encoding vectors are randomly chosen from the unit hyper-sphere. Most projections are 10ms AMPA. Recurrent projectsions are 50ms NMDA. 

Model occupies 24GB of RAM. 2.5 Hours of processing for 1s of simulated time. 

The main learning aspect of spaun is changing weights during the RL task. This is not changing the visual/motor hiearchies, but only weights that project to the value system, which are modulated by TD learning. More on learning. The learning requires an error signal, which he is not sure of how to implement (top-down like vs bottom-up like in basal and apical trees). 

Dynamics of the model are highly constrained by the time-constants of the synapses. 

Definitely more papers to read: visual system learning: Shinomoto (2003), Spike learning: MacNeil (2011), Normalization for probabilisitic inference: Eliasmith (2010)

Also, get his book.

Monday, December 10, 2012

Eliasmith 2012 supplemental I

Eliasmith et al. 2012 -  Supplemental.

"The central kinds of representation employed in the [Semantic Pointer Architecture] (SPA) are "semantic ponters". Semantic pointers are neurally realized representations of a vector space generated through a compression method." They are generated through compression of the data they represent. They carry info that is derived from their source. Point to more info. Lower dimensional representation of data that they point to. The compression can be learned or defined explicitly.

Interesting way of stating this. It sounds similar to the symbol-data binding. The pointer is the low-dimensional symbol that points to the high-dimensional data.

The Neural Engineering Framework is a set of methods that can compute functions on neural vectors (how to connect populations of neurons). Each neuron has a preferred direction vector. The spiking activity is written as:
a_i(x) = G_i [\alpha_i e_i x + J^bias_i]
a_i is spike train, G is nonlinearity, alpha is the gain, e is the preferred direction vector, J^bias is a bias current. He uses LIF neurons in spaun.

Then you can derive a linear decoder from the activity of a population. This can be optimized in a least-squares sense. Can take decoder to calculate weights to come up with transformation function.

The visual hierarchy in spaun is constructed by training RBM based auto-encoders on natural images. For spaun the first layer is 728 dimensional (28x28 pixel images). consecutive hidden layers of 1000, 500, 300 and 50 nodes. First layer is higher dimensional that actual input image. Learns many gabor-like filters in V1. In spaun the visual hierarchy does not have spiking neurons until IT (the top).

[on page 12, Working memory is next.]

Wednesday, December 5, 2012

A Large-Scale Model of the Functioning Brain

Eliasmith, C. Stewart, TC. Choo, X. Bekolay, T. DeWolf, T. Tang, Y. Rasmussen, D. (2012) A Large-Scale Model of the Functioning Brain. Science 338: 1202.

videos
supplemental

2.5 million neuron simulation of brain called "Spaun". They taught it to do 8 different tasks without changing any configurations of the network. Spaun takes a 28x28 pixel image as input and controls a simulated arm as output.

"Compression is a natural way to understand much of neural processing." higher-dimensional space in V1 (image-based) lower-dimensional space in IT (feature).

Fig. 1. Anatomical and functional architecture of Spaun. (A) The anatomical architecture of Spaun shows the major brain structures included in the model and their connectivity. Lines terminating in circles indicate GABAergec connections. Lines terminating in open squares indicatemodulatory dopaminergic connections. Box styles and colors indicate the relationship with the functional architecture in (B). PPC, posterior parietal cortex; M1, primary motor cortex; SMA, supplementary motor area; PM, premotor cortex; VLPFC, ventrolateral prefrontal cortex; OFC, orbitofrontal cortex; AIT, anterior inferior temporal cortex; Str, striatum; vStr, ventral striatum; STN, subthalamic nucleus; GPe, globus pallidus externus; GPi, globus pallidus internus; SNr, substantia nigra pars reticulata; SNc, substantia nigra pars compacta; VTA, ventral tegmental area; V2, secondary visual cortex; V4, extrastriate visual cortex. (B) The functional architecture of Spaun. Thick black lines indicate communication between elements of the cortex; thin lines indicate communication between the actionselection mechanism (basal ganglia) and the cortex. Boxes with rounded edges indicate that the actionselection mechanism can use activity changes to manipulate the flow of information into a subsystem. The open-square end of the line connecting reward evaluation and action selection denotes that this connection modulates connection weights. See table S1 for more detailed definitions of abbreviations, a summary of the function to anatomy mapping, and references supporting Spaun’s anatomical and functional assumptions.

The motor output is also hierarchical going from a low-dimensional goal representation to a high-dimensional representation that is in muscle space.

The spiking neurons are implementing a neural representation called "semantic ponters". From Eliasmith's websiteHigher-level cognitive functions in biological systems are made possible by semantic pointers. Semantic pointers are neural representations that carry partial semantic content and are composable into the representational structures necessary to support complex cognition. 

Eliasmith is also about to publish a book called: "How to build a brain." due out in 2013.

I'm pretty impressed by this. I'm going to spend some time look at his papers.

Sunday, December 2, 2012

Temporal vs. Parietal Cortex

The two competing hypotheses about cortical function are based on the differences between positive and negative feedback signals. Both of these types of feedback may be useful for different cortical areas, and they may have deeper relations. ART (Grossberg) describes the mechanisms of the positive feedback system, where predictive-coding (Maass) describes the negative feedback system.

We know that the visual system is basically split up into the "What" pathway and the "Where" pathway. The "What" pathway goes down temporal cortex, and as you move from V1 to IT neurons become object recognizers and lose spatial invariance. Going up the "Where" pathway (this is less studied) neurons are binding the objects to information about their spatial properties (position, movement, momentum etc).

So it seems that up the parietal cortex, having a system that can predict the future, and is constantly minimizing error signals based on these predictions would be very powerful. The negative feedback seems like an ideal way to derive the laws of mechanics and understand how objects move throughout the world. Grossberg says that you need resonance to have consciousness. This seems to fit as you are not really conscious of parietal cortex (your conscious mind does not have as much spatial information as your subconscious motor system).

Temporal cortex, however, is part of your conscious awareness. This is because, according to Grossberg, there are resonant states being created by the positive feedback system (you are not necessarily conscious of all resonance states). Resonant states are a binding of the data with a representation, this binding is the key to conscious awareness. The negative-feedback states are not bound - they propagating prediction errors.

Both types of these feedback systems will have rules that can be based on the two-input pyramidal cell model. Each pyramidal cell can receive different types of inputs - bottom-up (data) inputs through the basal tree, and top-down (symbol/class) inputs through the apical tree. Plasticity rules can be established to create both the positive and negative feedback systems.

The positive feedback system will strengthen synapses when the symbols are a good match to the data. The synapses will grow in strength each time a pattern is introduced, limited by some maximum strength or a normalization procedure.

The negative feedback system will compare a top-down prediction with a bottom-up state, and use the difference to modulate synaptic strengths. This will create negative loops and the system will be driven to local minimas.

How to create a mind 7

Chapter 7 is definitely where he gets into more detail about how he would actually create a mind. He basically explains that siri and other similar speech recognition algorithms is based on a hierarchical hidden markov model. These are states with transition probabilities, where each state would be like a pattern recognizer, and the transitions are the effective synaptic strengths.

Every learning algorithm has what Kurzweil calls "God Parameters". In his HHMM model, there were many parameters that we required at the initialization of the system. In order to optimally choose those parameters he used a genetic algorithm. This would lead to unexpected optimizations.

What was really fascinating was that they then altered the model in subtle ways - like adding leakage across markov states. They would then repeat the GA, and get comparable prediction quality (maybe even better), but the GA optimized parameters were totally different. If they used the GA parameters of the original configuration, performance would go down.

This has some important insights into the biology of intelligence. If our brain has some leak problems, or some unintentional side-effects of an implementation based on proteins, then the genetic algorithm will pick-out parameters that can offset these consequences, and potentially even use them for its advantage. So when looking at the brain, there's the mathematically beautiful thing it is trying to do (some sort of hierarchical learning) and then there's what it actually did (hiearchical learning with some tweaks). The tweaks in many ways could help the system, but would be reflected in potentially counterintuitive selection of the parameters.

Another thing he mentioned was the overfitting problem. He said that adding noise to the inputs actually aided learning, as it prevented overfitting to the examples given.

The ultimate conclusion of the chapter is building hierarchical pattern recognizers. He says that there are multiple types of learning algorithms that could do it, but he prefers HHMMs as he is most familiar with them and they are well characterized. But there are other options. Regardless of the choice, there are always "God" parameters that will need to be optimized via a GA.

He briefly mentions some other ideas that would go into the brain - a system that checks for inconsistencies, a system that looks for new problems, a goal system (i.e. pleasure and pain signals from the old brain). And he describes some limitations of the biological cortex that will not be in a digital cortex - like how many things you can keep in memory, or the number of active lists you can operate on.


So HHMMs seem like an interesting idea. I don't think its the full picture of neocortex. What the make-up of each pattern recognizer is will be important. HHMMs may be useful to study just to understand how they work, and they may give us some insight into how to handle the temporal side of cortex. And he still doesn't really say anything about the top of the hierarchy. He mentions that we would want the cortex to build as many levels as it would want/need, but how to make an arbitrary hierarchy that can change is a problem itself. It seems like there must be some point where the hierarchy goes back down (like the top is PFC and this feedsback down to a language area, which allows you to think and build arbitrary hiearchies).