Wednesday, December 4, 2013

Parental olfactory experience influences behavior and neural structure in subsequent generations

Dias, B.G., Ressler, K.J. (2013). Parental olfactory experience influences behavior and neural structure in subsequent generations. Nature Neuroscience.

The phenomenon of "inheritance of parental traumatic exposure".

We've long recognized that the ability to transmit information to offspring about specific environmental features or events would be remarkably advantageous. However, we've only been able to conceive of such "non-Mendelian" modes of inheritance only recently. Since Darwin, we have suspected that the information shared across generations was only crafted by natural selection and that traits were determined by who best reproduced and not through the experiences of the orgnaisms. But new empirical data is showing that epigentics can act as the mechanism to pass down information learned from the environment to the next generation.

A standard behavioral test is called Fear conditioning -- an animal is taught to associate an arbitrary stimulus was some kind of painful shock. Animals learn these associations very quickly, and we can measure an animal's startle response to the arbitrary stimulus as an indicator of how well an animal learned to associate the stimulus with the shock.

Three generation of mice were studied -- F0 were the grandparents and put in 3 conditions: no smells, exposed to acetophenone (smells like cherries), and exposed to propanol (control smell). The offspring of F0 (F1) were then tested to see if they responded to the smell that their parents were conditioned to. The mice responded more to the odor that their parents were conditioned to, but not the other odor, even though these mice were never exposed to either odors. This even extended to the next generation (F2).

Further, F1 mice showed higher sensitivity to the smell that their parents were conditioned with.

Cross-fostering experiments indicated that the behavioral changes were not passed down from social learning and indicated that the enhanced sensitivity and behavioral changes were genetic.

The glomeruli (packet of neurons that respond to specific smells) that respond to acetophenone showed anatomical differences when the parents were conditioned to acetophenone.

Figure 4 Behavioral sensitivity and neuroanatomical changes are inherited in F2 and IVF-derived generations. (a,b) Responses of F2-C57Bl/6J males revealed that F2-Ace-C57 mice had an enhanced sensitivity to acetophenone compared with F2-Prop-C57 mice (a). In contrast, F2-Prop-C57 mice had an enhanced sensitivity to propanol compared with F2-Ace-C57 mice (b; F2-Prop-C57, n = 8; F2-Ace-C57, n = 12; OPS to acetophenone: t test, P = 0.0158, t18 = 2.664; OPS to propanol: t test, P = 0.0343, t17 = 2.302). (c–f). F2-Ace-M71 mice whose F0 generation male had been conditioned to acetophenone had larger dorsal and medial M71 glomeruli in the olfactory bulb than F2-Prop-M71 mice whose F0 generation had been conditioned to propanol. Scale bar represents 200 µm. (g) Dorsal M71 glomerular area in F2 generation (M71-LacZ: F2-Prop, n = 7; F2-Ace, n = 8; t test, P < 0.0001, t13 = 5.926). (h) Medial M71 glomerular area in F2 generation (M71-LacZ: F2-Prop, n = 6; F2-Ace, n = 10; t test, P = 0.0006, t14 = 4.44). (i) Dorsal M71 glomerular area in IVF offspring (F1-Prop-IVF, n = 23; F1-Ace-IVF, n = 16; t test, P < 0.001, t37 = 4.083). (j) Medial M71 glomerular area in IVF offspring (F1-Prop-IVF, n = 16; F1-Ace-IVF, n = 19; t test, P < 0.001, t33 = 5.880). Data are presented as mean ± s.e.m.*P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

Finally, they looked at the methylation patterns on the odorant receptor genes in the sperm of the F0 mice. Methylation has been recognized as a potential marker for epi-genetic changes in transcription. There was a significant decrease in the amount of methylation of Olfr151 (which is responsive to acetophone), but no changes in Olfr6 (which is not responsive).

The idea that the experience of ones ancestors can influence behavior has important consequences. This may contribute to an "intergenerational transmission of risk for neurophsychiatric disorders, such as phobias, anxiety and post-traumatic stress disorder".

The fact that changes can propagate over more than one generation has implications for the process of evolution -- the actual experience of the parents altering the genetic makeup of the offspring means we may have to refine the way we think about how evolution works. However, it is unclear if these changes can last for a long enough time (over many generations) to truly alter evolution, as methylation patterns may not be stable enough for natural selection to have a long-term influence. But there could be mechanisms that can incorporate epi-genetic changes into the genome.

Wednesday, November 20, 2013

ICA Swim Maps

Neurons with coherent oscillations to the swim motor pattern colored in by phase.

Tuesday, November 5, 2013

network, milliseconds

The current state-of-the-art AI is based on "deep belief networks". These are neural models based on brain inpsired ideas, and are now beginning to be used effectively for AI. Part of the reason seems to be just some learning tricks ( like not minimizing the error function completely), and part is that computing power has become sufficient to make large and deep networks.

One of the big components is the "max pooling", which is modeled off of complex cells in visual cortex. The idea is that the simple cells respond to a convolutional filter -- like the gabor set of V1, or the ICs. And the complex cells listen to a range of simple cells and respond as the maximum of their inputs.

Then they build hierarchies of these networks. Convolutional filter, max pooling, convolutional filter max pooling. This is building a hiearchical component decomposition of the stimulus, which allows for local invariances of the features at different feature levels. With these types of networks they can now achieve human level performance on certain types of visual recognition tasks.

These advances are brilliant and I think are very close to part of the way the brain works. These type of deep belief networks don't have any feedback loops, but clearly the brain is full of them. What these networks seem to be is the "feed-forward" pathway of the cortical hierarchy. Well not really the cortical hiearchy, but a hierarchy that they learned via some derived learning rule. Essentially, their machine is like a human that is really well trained at recognizing a set of objects. Except the task isn't like a human staring at a picture, rather it is like the image flashes on the screen for 100 ms and you have to respond as fast as possible. I imagine with a lot of training a human could do this task as effectively as the deep belief nets.

I think this hits on an essential aspect of what the cortical hierarchy is doing. It is forming a component decomposition of the incoming sensory space. Now, the complex cells may be doing something kind of like max-pooling, but there could be much more sophisticated ways of pooling. If we imagine the visual hierarchy then thalamic inputs are sending in sensory signals into L4 of V1. Here, the basic "pixels" of the sensory world are being represented in a overcomplete feature space. This is analogous to the first simple cell layer of these deep belief networks.

In cortex, L4 seems to mainly project to L2/3. L2/3 characterized by many recurrent connections, it is close to the top-down input, and thought to be where the complex cells are. I'm not a fan of max pooling as what L2/3 is doing per se. I think that max pooling is just something that seems to happen because of both the way we study it and because it is doing something a little bit related. I feel like it is the "model" of what you see. L2/3 is like what you see when you imagine seeing it. So when you actually see something both L4 and L2/3 are going to be active -- as you are simultaneously seeing and classifying/modeling what you see. L2/3 seems to be just locally max-pooling, but it is probably just less sensitive to the local features and pooling data from more distal sources.

So the deep belief networks model the feed-forward pathway. Low-level features in V1, texture features in V2, shape features in V4, motion in MT. The higher level areas were likely shaped by evolution to be particularly good at making their transformations. The nervous system is so amazingly flexible, that the mechanisms for the computations up the feed-forward pathway could be very diverse. But the main idea would be looking for particular correlations in the features.

The brain seems to not be a simple vertical hierarchy as many of the deep belief nets work, but it kind of fans out and then comes back together at the top. Further, the brain utilizes every level of the feature space -- whereas these deep belief nets seem to just rely on the output of the top (mainly because they are trying to make a classification). There is the what and the where pathway -- an object-centric spatially invariant classification system, and a relational, predictive, operational centers bound together at various levels in the hierarchy.

The what pathway is like the object consciousness. When you look at anything, say my dog suki, your brain is representing information across the cortex simultaneously. V1 is representing the low-level sensory information -- the color of suki's hair, her outline, the slits in her close eyes. V2 is unifying the hair as a contiguous texture, V4 is outlining her dimensions creating a 3D model of her shape that you can just feel. MT picks up on the subtle motions of her breathe, and notices her ears flicker.

Further down the what pathway more "higer-level" features seem to be represented. Face patches respond to different types of invariances -- regions for profiles, regions for frontal face. My what pathway connects me to the knowledge that I'm looking at a dog, dogs have these typical shape. Higher up I know that this is a particular dog. Higher up I associated my daily memories with suki. Every time I look at suki this information cascades up in a few 100 milliseconds and becomes accessible to the rest of my brain.

Monday, October 28, 2013

Jonathan Shlens

he came from google to give a talk about using semantic context to improve visual systems. It was quite good.

Basically he combined a deep learning visual network, with one of these semantic networks. He made a mapping between the visual network and the semantic network that put visual cues into a continuous semantic space. Then he could teach the visual network these image classes (hand labeled from imageNET(?)) and it would learn the mapping between the visual cues and the semantic space. The semantic space was learned from wikipedia -- just like next word predictors (that are also deep I think). The visual network could then recognize completely new visual categories (not just new images within a learned category). Because the semantic space would link the categories (the category relationships are learned through wikipedia). He makes the metaphor that the visual system has seen a desk chair, but never a rocking chair. But the semantic system has read about a rocking chair and its relation to a desk chair and other chairs. The system then sees a rocking chair for the first time and can infer that it is indeed a rocking chair through its semantic knowledge.

There is also this trick of using basically a hash-table to quickly get out approximate filters in a deep network, this speeds up the system by almost 20,000x.
Here is the paper

I can't find much of his other papers. He was in EJ's lab and has a lot of papers with Greg fields and Jeff Gauthier.

And there is this -- the code for a rbm/deep belief net run on gpu

And this which was used to get all the stuff from wikipedia:

Kernel ICA is too slow

yeah, looks like it does effective ICA, but it takes forever. 8 components takes 2 minutes, 16 takes 10 minutes, 32 takes almost an hour.

I'm running it up to 128, so if it keeps going up in the same pattern -- 64 will take 5 hours, 128 will take more than a day...

Schmidhuber Deep Learning Review

A good set of articles and timeline about deep learning

Friday, September 20, 2013

Image alignment with ecc

Ok, so I had this idea to align all the images across trials, so that I could run ICA on a concatendated stack, and thus get the same ICs for an entire experiment.

So I found ecc, some image registration algorithm, and it works pretty well.

I've been playing with it as a motion correction algorithm too, and it detects the sub-pixel movements pretty well. ecc is just a registration algorithm, so my motion correction is just registering each frame to the first frame.

The result is a 2x3xNFrames matrix that is the affine transform. Then I can plot the values of the matrix over time to see what kind of motion there is.

So like the far right column is the translational component, and the others is like a rotation/scaling matrix. The motion is quite subtle, but clearly it is there.

Then you can kind of see how the ganglion moves with a picture like this:
The numbers on this axis are pixel values. So this is very subtle movement (that's why its hard to see by eye). The big circle is the start pont. There are two traces -- a black and blue, which indicate the rotational/scaling aspects in their seperation.

Yeah, so this works pretty well, and it looks like it helps quite a bit.

One of the problems is that the edges of the image get strange, so the final result has to be clipped at the edges. This is easy to do: we just have to find the largest distance in the transform (for motion its usually sub-pixel) and round up to nearest pixel. Then take away that many pixels from the border.

This is x120516-t6, which was pretty scary motion, and changes the shortening response quite significantly:

Tuesday, September 17, 2013

Validation of Independent Component Analysis for Rapid Spike Sorting of Optical Recording Data

Hill, E.S., Moore-Kochlacs, C., Vasireddi, S.K., Sejnowski, T.J., Frost, W.N. (2010). Validation of Independent Component Analysis for Rapid Spike Sorting of Optical Recording Data. J Neurophysiol 104: 3721-3731.

So same basic idea, use ICA to extract components from optical data. This is pretty impressive because these are fast VSDs and they are looking at spike trains. It works really well for this, thats pretty much all there is to say. Spikes are ideally suited for ICA because of their sparse nature.

One thing they did was concatenate multiple optical files and ran ICA to get the same neurons across trials. This could work for me, but would have to realign the images for each trial.

Low and hi-pass filters help. More data helps.

One thing to extend this work would be to validate ICA not just for spike traces, but also for sub-threshold changes in the membrane potential.

BRAIN Interim Report

The Interim report for the BRAIN initiative came out yesterday. This report talks about the initial goals of the BRAIN Initiative, mainly focusing on short-term projects for 2014. There will be a final report out in June.

The report draws the focus of the BRAIN Initiative to questions about neural coding, neural circuit dynamics and neuromodulation. The analysis of circuits is "particularly rich in opportunity, with potential for revolutionary advances" -- we currently think that the activity and modulation of large ensembles of neurons are what underpin mental experience and behavior. 

Understanding these questions is daunting -- the human mind is built from a unimaginable tangle of almost a 100 billion neurons. In order to understand this complexity will require new tools to record from a large number of neurons, new analysis techniques that can make sense of the "big data" that will be generated, and new computational theory that puts it all together.

The organizers recognize the limitations of a purely human-based approach to studying neuroscience, and that both technical and ethical issues require that the BRAIN Initiative include appropriate model organisms. They emphasize a diversity of approaches and organisms, citing different advantages for all of the different model organisms in neuroscience -- rhesus macaques (evolutionary proximity to humans), mice (mammalian, genetic tools), zebrafish (vertabrate, optical tools), worms and flies (small nervous systems, genetic tools), and molluscs, crabs and leeches (defined nervous system, electrophysiology). Other species will also highlight important brain functions through their particular niche -- i.e. songbirds are the only animals (besides humans) that have instructed vocal learning.

They summarize 9 "high-priority" research areas for 2014:
1. Generate a Census of Cell Types. 
2. Create Structural Maps of the Brain.
3. Develop New Large-Scale Recording Capabilities.
4. Develop a Suite of Tools for Circuit Manipulation.
5. Link Neuronal Activity to Behavior.
6. Integrate Theory, Modeling, Statistics, and Computation with Experimentation.
7. Delineate Mechanisms Underlying Human Imaging Technologies.
8. Create Mechanisms to Enable Collection of Human Data.
9. Disseminate Knowledge and Training.

This is a great start for what could be a revolutionary initiative! The current level of funding is only $40 million, which is less than 1% of what just the NIH gives to neuroscience research currently ($5.5 B). There is hopefully more money to come -- $3 B was floated for the next few years.

Monday, September 16, 2013

Automated Analysis of Cellular Signals from Large-Scale Calcium Imaging Data

Mukamel, E. A., Nimmerjahn, A., Schnitzer, M. J. (2009). Automated Analysis of Cellular Signals from Large-Scale Calcium Imaging Data. Neuron.

This paper basically does the PCA-ICA breakdown of the imaging signals.
Figure 1. Analytical Stages of Automated Cell Sorting
(A) The goal of cell sorting is to extract cellular signals from imaging data (left) by estimating spatial filters (middle) and activity traces (right) for each cell. The example depicts typical fluorescence transients in the cerebellar cortex as observed in optical cross-section. Transients in Purkinje cell dendrites arise across elongated areas seen as stripes in the movie data. Transients in Bergmann glial fibers tend to be more localized, appearing ellipsoidal.
(B) Automated cell sorting has four stages that address specific analysis challenges.

So they also do this ICA in both space and time dimensions, and use both for trying to identify the cell traces. Most of the info comes from the space dimension (like the way I do it), and the best uses a 0.1-0.2 weighted combination of time and space.

In the image segmentation step, they identify spatially separate filters that are caused by different neurons that are highly correlated. Typically ICA handles correlations above 0.8 well, but occassionally it picks out two cells as one component.

Yeah, this is basically the paper that I want to write, but apparently its already been done... Going to check out their toolbox and see if there are any new tricks I can add.

Brain-wide 3D imaging of neuronal activity in Caenorhabditis elegans with sculpted light

Schrodel, T., Prevedel, R., Aumayr, K., Zimmer, M., Vaziri, A. (2013). Brain-wide 3D imaging of neuronal activity in C. elegans with sculpted light. Nature Methods.

WF-TeFo (wide-field temporal focusing) imaging of nuclear based calcium indicator can see 70% of c. elegans neurons.

Ok, from what it sounds like this technique is that there is a femto-second laser and the beam is broken up spectrally through a grating. This spreads the frequencies of light in time and space, and only at the focus does all of the laser light come back together. This enhances the 2-photon absorption at the focus, and diminishes any outside absorption. This is because the outside is seeing temporally offset waves of light at different frequencies. They acquire volumes at 5Hz.

So then they start looking at some of the activity, and group the neural responses by "agglomerative hierarchical clustering", which is apparently just the matlab function: linkage. The distance matrix is based on the correlation/covariance matrix of the responses over time.

Monday, September 2, 2013

Tuned thalamic excitation is amplified by visual cortical circuits

Lien, A.D., Scanziani, M. (2013). Tuned thalamic excitation is amplified by visual cortical circuits. Nature Neuroscience 16(9): 1315-1323.

Recording from L4 of visual cortex while showing random white/black dots and drifting gratings. Separate the thalamic component from the cortical component by activating PV with ChR2, which silences the cortex.

Cells receive on and off thalamic inputs (thalamic neurons that respond to increases and decreases in luminance) that have peaks which are slightly off-center. The orientation selectivity arises because an orientation will activate both the on and off field if situated appropriately.

Interestinyly, the thalamic input's integral is constant regardless of the orientaiton of the stimulus (see H in figure 4 below). The tuning arises only because of the synchronous inputs. The neuron is receiving a bunch of current from all of thalamus (which has no orientation tuning), and its preferred orientation is a result of the two thalamic pathways being activated simultaneously.

Figure 4 Separation of ON and OFF thalamic subfields predicts preferred orientation of thalamic excitation. (a–d) Example recording of EPSCThal which both the ON and OFF receptive fields and the responses to drifting gratings at various orientations were obtained in the same cell. (a) EPSCThal in response to black and white squares. Data are averages of five trials per location. (b) Contour plot of the OFF and ON receptive field maps for the cell shown in a. Each contour represents two z scores. Filled magenta and green circles mark the peaks of the OFF and ON receptive fields, respectively. Dashed black line connects the OFF and ON peaks to define the ON-OFF axis. The preferred orientation predicted from the ON-OFF axis, RF_Pref, is indicated by the small grating. (c) EPSCThal in response to drifting gratings of various orientations (average of three trials per direction). The gray rectangle indicates the visual stimulus (1.7 s) and the blue bars represent LED illumination (2.6 s). (d) Orientation tuning curves of F1Thal (blue) and QThal (gray) in polar coordinates for the responses shown in c. The blue line indicates the preferred orientation of F1Thal (Grating_Pref) and the black dashed line corresponds to RFPref. (e) Data presented as in b and d for three additional cells. Tuning curves on polar coordinates in d and e are normalized to peak response. Outer circle represents peak value. (f) RFPref  plotted against GratingPref (n = 8 cells, 7 mice). The black line represents unity. The dashed lines denote the region in which the difference between RFPref  and GratingPref is less than 30 degrees. The distributions of GratingPref (n = 42 cells, 33 mice) and RFPref  (n = 13 cells, 12 mice) across the population of cells in which either value was measured are shown along the top and right, respectively. (g) Absolute difference in RFPref and GratingPref (∆Pref Ori) (n = 8 cells, 7 mice). Error bar represents ± s.e.m. (h) Diagram of how orientation tuning of F1Thal can arise from spatially offset OFF and ON thalamic excitatory input (t1 = time 1, t2 = time 2). The area of the blue shaded region corresponds to QThal. The difference  between the peak and the trough of EPSCThal corresponds to F1Thal

Then to get the cortical component, they just simply subtract the thalamic component from the total. The cortical component is tuned with the thalamic component, but the Q coming in is now aligned with the preferrred orientation. Essentially suggesting that neurons with similar preferences in cortex wire together more strongtly. 

Figure 6. Tuning of non-thalamic excitatory F1 modulation. (a) Example cell. top, EPSC_Sub in response to drifting gratings of various orientations. The gray rectangle represents visual stimulus (1.5s) and the blue bar represents LED illumination (2.6s). Bottom, F1 modualation of EPSC_Sub. Shown are the cycle average (black) and best-fitting sinusoid (green) at the grating temporal frequency (2 Hz). (b) Orientation tuning curves of _Sub (dotted curve) and F1_Sub (solid curve) for the example cell shown in a. (c) Population tuning curve of Q_Sub (dotted curve) and F1_Sub(solid curve. Left, population tuning curves in which Q_Sub and F1_Sub tuning curves for each cell were equally shifted so that the preferred direction of Q_Sub occurred at 0 degrees (Q_Sub reference). Right, population tuning curves in which Q_Sub and F1_Sub tuning curves for each cell were independently shifted so that preferred direction of Q_Sub and F1_sub both occurred at 0 degrees (self reference). (d) OSI of F1_Sub was plotted against OSI of Q_Sub for all neurons. (e) Distribution of absolute differences in preferred orientation (D Pref Ori) between Q_Sub and F1_Sub. The dark curve represents all cells (n=42). The gray curve represents cells in the top 50th percentile of F1_Sub OSI (n=21). (f, g) Data are presented as in d and e for DSI and absolute differences in preferred direction (D Pref Dir). Filled green markers in d and f denote the OSI and DSI values of the example cell. Data in c-g are from n=42 cells from 33 mice. Error bars represent +- sem.

And they show that the cortical component is closesly tuned with the thalamic component, possibly with a 40ms offset or 30 degree phase delay. 

Thalamus provided about 30% of the charge to cortical neurons.

Monday, August 19, 2013

Connectomic reconstruction of the inner plexiform layer in the mouse retina

Helmstaedter, M., Briggman, K. L., Turaga, S.C., Jain, V., Seung, H. S., Denk, W. (2013). Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 500

So this is Kevin's retina data, but with extensive annotations from 224 undergrads (who did the skeletons), and then an algorithm to fill the volume and link to the skeletons.

They first classify the reconstructed cells into types, mainly based on the layer-branching pattern.

They further go on to show that some cells can be classified based on the connectivity patterns. CBC5 have been seen before, and thought to contain several true classes of ganglion cells. They show that there is a connectivity feature that separates CBC5A from the rest. They show that CBC5A is a "true type" because it tiles the retina.

Finally they look at the circuitry of some of the cells.

Monday, August 5, 2013

The BRAIN grid

Ok, so this is an extension of the famous multiple levels figure of Churchland and Sejnowski. I like the idea of it being 7x7, but I've changed the spatial levels slightly. The original has 7, but I've consolidated the original "Systems, Maps, Networks" into "Maps and Circuits", and then added "Compartments" as a level that sits between neurons and synapses. Compartments is supposed to be information about the sub-cellular structure, but I could also merge compartments with synapses (these are both sub-cellular structures) if a separation at the larger areas are more important.

Here are just some general thoughts about the temporal scales:
Milliseconds: Spikes, instant state of the brain
Seconds: Plasticity, working memory
Hours: Long-term storage
Days: Sleep, consolidation
Lifetime: Development

Ok, so the goal is to just put everything we know about neuroscience into these 49 squares and link them all together with computational theory. Here is just some ideas:

At this level we are mainly focused on the computational machine that is the DNA-protein network. Experience and the environment feed all the way down to the genetic level.

Milliseconds: Channels, Chemical Reactions
Seconds: Feedback signals,
Hours: Protein synthesis
Days: Gene activation, genes on sleep vs. awake
Lifetime: Genetic programs development
Evolution: DNA evolutionary dynamics

I like synapses as their own level because they are so important and so varied.
Milliseconds: Potentials, vesicle release, Macro-channel dynamics, gap-junctions
Seconds: Calcium, excitability, g-protein receptors, neuromodulators, STP
Hours: Hormones, neuromodulators, LTP, synaptogenisis
Days: Consolidation, stabilization, synaptogensis, elimination
Years: regeneration
Lifetime: circuit formation, development
Evolution: evolution of synapse

The sub-cellular compartments of a neuron are extremely important
Milliseconds: Spike-initiation zone, Multi-layer perceptron integration, Shunting Inhibition
Seconds: Calcium spikes, neuromodulators, bursting
Hours: Dendritic plasticity, Apical-basal interations in learning
Lifetime: Layers
Evolution: Simple 1-compartment neuron to many many compartmental pyramidal cells

Neurons I like in the middle. This is the foundation of the brain.
Milliseconds: Spikes
Seconds: Up-Down states, modulators
Days: Consolidation
Years: Neurogenesis
Lifetime: Types, Layers

The basic building blocks of computation
Milliseconds: Firing-rate space, instantaneous state
Seconds: Gamma, working memory, neuromodulation, evidence accumulation
Hours: memory, learning
Days: sleep consolidation,
Years: learning
Lifetime: Cortical development
Evolution: 3-layer to 6-layer, basic building blocks

Maps for large networks. I like maps because of how well a map corresponds to a brain region (the visual fields have retinotopic maps, auditory maps). A Map processes a modality of information.
Milliseconds: Sharp-wave ripples, Attention
Seconds: Theta (within map), Alpha (between maps), decisions
Hours: Memory
Days: Consolidation
Lifetime: Within map, between map development
Evolution: Cortex, sensory/motor systems,

The whole brain network.
Milliseconds: qualia, consciousness
Seconds: Attention,
Days: Sleep-wake cycle
Lifetime: large-scale development
Evolution: The evolution of the brain

So, a lot of ideas span across several spots in the grid -- like neuromodulators.

Monday, July 29, 2013

Motor Cortex Feedback Influences Sensory Processing by Modulating Network State

Zagha, E., Casale, A.E., Sachdev, R.N.S., McGinley, M.J., McCormick, D.A. (2013). Motor Cortex Feedback Influence Sensory Processing by Modulating Network State. Neuron 

Neuromodulators classically recognized as modulating network state, but slow and spatially distributed. Glutamate may play a role in modulation. Analyzing motor and sensory cortical areas of a specific whisker.

Non-whisking: low-frequency rhythms in M1 and S1. Highly coherent
Whisking: increased gamma in M1 and S1 - "activated state". Could see "activated state" even when no whisking or other obvious behavior.

Inactivation of M1 with muscimol reduced whisking, slowed network activity, reversed phase offset of coherence, and generally reduced frequencies in S1. Low freqs increase, high freqs decrease.

ChR2 activation of M1 (not sure which neurons - AAV virus and EMX1 Cre expression) decreases delta power in S1 and increases activity. In anesthitized animals, graded activation of M1 decreases delta and increases gamma. Very rapid - tens of milliseconds. Laminar recordings show that slow oscillations were eliminated in all layers, increased spiking primarily in infraganular neurons.

Now they're doing some layer analysis. Stimulate the whisker and you get current sinks in 2/3, 4, 5, current sources in 1 and 6 in S1. Activate M1 get almost opposite pattern: sinks in 5, 6, 1; sources in 2/3.

Figure 5. Evidence for Involvement of the Corticocortical Feedback Pathway
(A and B) CSD plots of average S1 responses from an example experiment. Brief (5 ms) deflections of the principal whisker (A) evoked onset current sinks in layers IV, II/III, and V and current sources in layers I and VI. Brief (5 ms) vM1 stimuli (B) evoked onset current sinks in layers V, VI, and layer I and current sources in layers II/III. Stimulus durations are depicted by the colored boxes in the bottom left of each plot. Color scales represent ±10 mV/mm stimuli and ±5 mV/mm2 for vM1 stimuli.
(C) Synaptic responses from layer V S1 neurons in vitro, evoked by stimulating axons and terminals of vM1 neurons in S1. The 2 ms light pulses are indicated by blue dots below traces. Responses from a regular spiking (RS) neuron, consisting of a short latency EPSP at rest (top) and an EPSP-IPSP sequence (middle) when depolarized to just below spike threshold. Bottom: EPSP from a fast spiking (FS) neuron at rest.
(D) Population data, quantifying connection probabilities (left), and response amplitudes (right) from vM1 inputs onto regular spiking and fast spiking neurons in S1.
(E) In vivo S1 response to stimulation of vM1 axons in S1. Limiting direct stimulation to the corticocortical vM1 axons was sufficient to evoke S1 activation. Error bars represent SE. See also Figure S3.

They applied CNQX on the surface. Low concentration just blocked the L1 projection of M1, high concentration blocked L1 and L5 projection. L5 neurons still activitated with low concentration.

They next blocked thalamus with mucimol. M1 activation doesnt need thalamus.

Suppressing M1 during stimulation shifts S1 response to biphasic activation. There's an initial stimulus driven burst, followed by silence and LFP rebound, followed by another slower burst. M1 activation during stimulation reduces the variability of responses in S1.

In general it seems that the feedback pathway puts S1 in the "up state", which is also useful because more sensory information can be processed in the up state. There's probably up and down states for different layers. The M1 feedback to the different layers is probably for different purposes -- S1 both needs regulation based on M1 information, and the informatoin itself (S1 needs to know motor state for correct processing).

Friday, July 19, 2013


Looking at some of Schmidhuber's papers. Here's a few:

Ciresan, D., Meier, U., Masci, J., Schmidhuber, J. (2012). Multi-column deep neural network for traffic sign classification. Neural Networks.

Ciresan, D.C., Meier, U., Gambardella, L., Schmidhuber, J. (2012). Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation.

Schmidhuber, J. (2009). Ultimate Cognition a la Godel. Cogn Comput.

Here's the first author, Ciresan, website: Theres some interesting things here.

Going to go through the first one. This won the final phase of hte "German traffic sign recognition benchmark" with better-than-human recognition rate of 99.46%.

The Deep Neural Network (DNN) is hierarchical neural network that alternates convolution (i.e. receptive fields/simple cells) with max-pooling (i.e. complex cells). They cite this paper in reference to the implementation of their machine -- takes days instead of months to learn the signs with GPU acceleration.

Here's the basic architecture:

 The convolution layer is basically just a receptive field for each neuron -- the same receptive field shape is spread out over all the pixel space. Just weighted sum, bias and non-linear activation function. Next is a max-pooling layer, which outputs the maximum activation over non-overlapping rectangular regions. The last layers are fully connected and form 1D feature vector. Softmax such that the output is the probability of the image belonging to a particular class.

Each instance of the training set is distorted by affine transforms to get more samples: translation, rotation, scaling. This helps prevent overfitting and adds in wanted invariances.

The output is essentially the average of several DNN columns. Various columns are trained on the same inputs, or inputs preprocessed in different ways. Preprocessing does contrast normalization, histeq, adaptive histeq, and imadjust.

We use a system with a Core i7–950 (3.33 GHz), 24 GB DDR3, and four graphics cards of type GTX 580.

Looking at his other stuff, he has some good ideas. One big one is "Flat Minima Search", where the idea is that you are looking for a "flat" local minima, where the error remains approximately constant with nearby weight parameters. This adds a simplicity bias and thus results in better generalizations. He also talks about this idea of detectors being independent, and basically that you have predictors which try and guess what a detector does based on what everything else is doing, and then the detectors try and do something different than everybody else.

Wednesday, July 17, 2013

Jeremy Biane and Motor Cortex

Just went to Jeremy's defense. It was good, he used retrograde markers to label L5 spinal cord projecting pyramidal cells that projected either to C4 or C8 which control proximal and distal forearm muscles respectively. Upon training of a distal muscle task he recorded and tested to see if C4 populations and C8 population were more connected to each other after learning the motor skill.

Since it was a primarily distal forelimb behavior only C8 projecting neurons showed differences in their structure and connectivity after training. They formed more connections (from about 2% connectivity to 6%), but they were weaker on average (but that may because there are just so many more new connections). No changes C4<->4 or C4<->8.

Its particularly interesting because motor cortex has basically no layer 4. In fact he said that layer 5 in M1 is 5a and 5b, but no layer 4. Layer 5 seems to be a major "output" layer of cortex, and perhaps it is the layer that can set up these arbitrary patterns (like the heteroclinic channel layer, or the CPG layer). It seems that layer 5 projects mainly up the hierarchy. L5 from V1 goes to higher-order thalamus, L6 goes feedback to LGN.

Monday, July 8, 2013

Characterization, Stability and Convergence of Hierarchical Clustering Methods

Carlsson, G., Memoli, F. (2010). Characterization, Stability and Convergence of Hierarchical Clustering Methods. Journal of Machine Learning Research 11: 1425-1470.

Kleinberg, 2002: There exists no clustering algorithm that satisfies scale invariance, richness and consistency. Natural question is what about hierarchical clusters?

Going to skip some math and notation. Just formalities as this is a math paper.

A hierarchical clustering method is a map that assign a dendrogram to a finite metric space. HC methods operate on a metric space, where the points in the space are denoted as X and the distances between points are denoted as D. D is the distance matrix and is X x X in size.

Yeah, so this goes into some deep theory about HC methods and that HC is the same as mapping from a metric space to an ultrametric space (a metric space satisfies the triangle inequality, and in an ultrametric space all triangles are isocoles(?)) . Not a lot of practicality in this paper.

They also talk about similarities between dendrograms.

Monday, July 1, 2013

What can different brains do with reward?

Murray, E.A., Wise, S.P., Rhodes, S.E.V. (2011) What can different brains do with reward? Neurobiology of Sensation and Reward. ed. Gottfried, J.A. Boca Raton (FL): CRC Press; 10

This looks like a good perspective of brain evolution in the context of reward-based learning/problem solving.

Animals evolved as award seekers, evolutionary view of reward from 3 main clades: early vertebrates, early mammals, and primates.

The opening section talks about the history of brain-evolution science. Many pitfalls, controversy and disargreement, lots of people were just completely wrong.

Some definitions:
Homology: A structure or behavior is homologous to another if two or more descendant species have inherited it from their most recent common ancestor.

Analogy: ancestor. Analogy is a statement about function, not ancestry. A structure or behavior is analogous to another if it subserves the same function. The classic example involves wings. Insects, birds, and bats have wings

Homoplasy: something similar that has evolved in different lineages through parallel or convergent evolution.

Invertebrates are arbitrary grouping, protostomes are group of insects, mollusks and segmented worms. Also deuterostomes, which separated 600 MYA. Protostomes and vertabrates have evolved tremendously since our common ancestor. Vertabrates evolved with deuterostomes.

Three cladograms, arranged from top to bottom. The middle and bottom cladograms each develop one of the lineages from the cladogram above, as shown by the connecting arrows. The circled letters, A, B, and C, reference common ancestors referred to in the text. Beneath the names of selected clades, which are outlined in boxes, shared derived traits appear in italics. Abbreviation: DA, dopamine.

The telencephalon and early dopamine system were most important evolutionary developments. Early telencephalon was olfactory bulb and a homologue of piriform cortex. Contained homologue of basal ganglia and probably amygdala and hippocampus.

Mammals use a mixture of old and new features to deal with reward. New: neocortex. Hipp and piriform are still allocortex, there's some transition cortical areas. Rodents have similar prefrontal architecture, but not completely homologous to primates.

Primates have main type of frontal lobe: agranular cortex (no layer 4) and areas with subtle layer 4, collectively called granular prefrontal cortex (PFg). This is a primate innovation as all primates have PFg. Rodents have the agranular parts.

Nonvertabrates can deal with reward, associative learning (pavlovian) seems to be something before protostome/deuterostome split. Instrumental conditioning also shown in invertabrates.

Dopamine System: regulates the classical conditioning method, error signal etc. Dopamine could be acting across several orders of time to influence reward-based behavior.

Basal ganglia: confounded because of its role in both reward and movement processing, but seems to be movement regulated by reward -- computes the cost of energy requirements. "Bradykinesia represents an implicit decision not to move fast because of a shift in the cost/benefit ratio of the energy expenditure needed to move at normal speed."

Amygdala: Reinfocer devaluation == stop eating once you're satieted. Amygdala lesions remove this ability. Lots of other roles, delay signal, affective signals, controls quicker behavioral changes.

Hippocampus: Spatial computations. Large place fields may involve recognition of contexts and could be important for reward processing.

Neocortex in mammals has allowed for even more control over reward. Agranular FC is homologous in rodents and primates although there are now differences from the last 10 MYA of evolution. Mammals have imporved "executive function", mediated by agranular frontal cortex, including top-down modulatory function that biases competition among different brain systems engaged in and competing for control of behavior.

Mammals have: anterior cingulate (AC), infralimbic (IL), prelimbic (PL), agranular orbital frontal (OFa) and agranular insular (Ia). These different parts afford mammals greater flexibility in their reward-seeking behavior. Ia gets many visceral signals -- Ia functions in interoception pain, itch, temperature, metabolic state, lungs, heart, baroreceptors and digestive tract.

Several dissociated memory systems combine to guide reward-seeking behavior. Nonmammalian vertebrates (birds, reptiles), appear to have problems overriding their innate behavioral responses.

Anterior Cingulate: biases behavioral control towards one among multiple competing stimuli. Weighs the cost-benefit: is reward worth the effort? AC allows the animal to weigh more behavioral options (it can present the reward system with more possible behavioral choices).

Prelimbic cortex: involved in regulating goal-directed behaviors in cases where they compete with habitual stimulus-responses. Helps encode the response-outcome associations (but not execution of them).

Inframlimbic cortex: seems to be opposite of PL, promotes behavioral control of S-R associations. Plays a role in extinction learning -- biases behavior towards more recent newly learned rules (that a stimulus no-longer gives reward) than to the older more stongly associated rules).

Orbitofrontal Cortex: problems between rodent and primate OFC -- no homolog in rodents with granular orbitorfrontal (PFo), only agranular has homolog (OFa). Neural activity in OFa reflects reward expectation, especially sensory-specific properties of the reward. OFa lesions impair ability to make decisions on basis of reward expectations.The OFa contributes more to learning the associations between CSs and the sensory aspects of reward (e.g. taste). It doesn't compute the biological value per se.

Agranular Insular Cortex: Relates sensory properties of the reward to the instrumental motivations, playing a complementary role to OFa. This means that Ia and OFa likely store the sensory related properties of rewards such that they can be used during recall to help evaluate different reward-based decisions.

Primates have PFg -- granular parts of frontal cortex. Also extra sensory areas like IT. Granular orbital frontal (PFo) gets strong projections from IT and other posterior sensory areas including auditory cortex. PFo is one of earliest sites for convergence of visual information with visceral inputs. Primates can then link visceral, olfactory and gustatory inputs with high-order visual stimuli. PFo represents high-level details and conjunctions of sensory features for rrewards, the magnitude of reward, the probability of reward and the effort required to obtain it. Computes reward in a common currency to pit risk vs. reward in decisions.

Can learn rules and strategies for rewards instead of stimulus and action relations to outcomes. Can dissociate the emotional value of reward with a value-less reward signal.

Humans can do longer term learning, and "mental time-travel" which help them with reward processing. Further they can talk about reward and have secondary rewards -- i.e. I don't want to want to smoke.

Thursday, June 27, 2013

Generating Coherent Patterns of Activity from Chaotic Neural Networks

Sussillo, D., Abbott, L.F. (2009). Generating Coherent Patterns of Activity from Chaotic Neural Networks. Neuron 63 544-557.

I wanted to get into the Larry Abbott stuff as it seemed to pertain to the local bending model I was envisioning. So this paper covers these ideas of using chaotic random networks as the basis for learning arbitrary patterns. Essentially because of the high dimensional space the pattern is easy to read out with some simple linear combination. They expand upon previous work and make the model more complex and incorporate "FORCE" learning.

The idea of FORCE learning is that you start with a fast learn (pretty easy because its just a linear thing), but you don't immediately try to get the error all the way to zero. You keep a small amount of error that then decreases slowly. The reason for this is that the networks tend to destabilize and ruin the output. Part of the key of keeping the network in line is that the output feedsback into the network, and the chaos gets overcome by the pattern. Essentially with no stimuli the network tends to be chaotic, but when given a periodic stimuli to learn, the network loses its chaos.

From the examples its quite good at learning arbitrary patterns. They do an interesting PCA analysis of the network after its learned a particular pattern. The network is initiated with the same random connectivity, and then taught to learn the same pattern. The readout weights are initiated at different random strengths. Once the pattern is learned, the major components (the first few PCs) converge to the same point, regardless of where the readout weights were initialized. The later components end up being at a different place.

The learning performs best right on the verge of chaos. By tuning a network parameter (essentially the average synaptic strength), they can vary the amount of chaos in the network. It performs poorly at learning tasks when the network is not in the chaotic regime, but it also cannot work when the network is too chaotic. The learned pattern has to be able to overcome the intrinsic network chaos, and the network needs some chaos so that it can learn the pattern.

Finally they show a network that can do interesting input-output transformations. They add some control inputs, which essentially connect to the network randomly and train the outputs when the appropriate control inputs come on.

(A) Network with control inputs used to produce multiple output patterns (synapses and readout weights that are modifiable in red).
(B) Five outputs (one cycle of each periodic function made from three sinusoids is shown) generated by a single network and selected by static control inputs.
(C) A network with four outputs and eight inputs used to produce a 4-bit memory (modifiable synapses and readout weights in red).
(D) Red traces are the four outputs, with green traces showing their target values. Purple traces show the eight inputs, divided into ON and OFF pairs associated with the output trace above
them. The upper input in each pair turns the corresponding output on (sets it to +1). The lower input of each pair turns the output off (sets it to  1). After learning, the network has implemented a 4-bit memory, with each output responding only to its two inputs while ignoring the other inputs.

The 4-bit memory is a pretty cool example. For each bit there is an on and off input pulse, which flips the state of the corresponding readout unit. So in the network there are 16 fixed points corresponding to all possible bit combinations.

Then they have a much more complex example of the network implementing running and walking like behaviors. They got some motion capture data and trained the network to mimic the joint angles for 95 different joints. And the network was able to learn both walking and running.

The network synaptic structure is linear. Firing rate model that goes through a tanh output non-linearity

Wednesday, June 26, 2013

Leech mechanosensory model - Fractional derivative

Ok, now I've incorporated the T cells, but I thought a simple linear filter would enable rates like in the animal. But it wasn't quite right -- the T cells didnt fire on the off phase of the steps. So instead of doing the linear filters, I implemented fractional derivatives for the "filters". So now the P cell filter is the 0.3 order fractional derivative of the sensory signal, and the T cell filter is the 1.6 order fractional derivative.
Figure 1: Sensory signal, P and T cell currents based on fractional derivative of signal.
Figure 2: receptive fields of P and T cells.
Figure 3: P cell responses to sensory signal.
Figure 4: T cell responses to stimulus

Leech mechanosensory model - sensory representation

Ok, I'm going to spend a few days just working on this new model of the leech sensory system.

It starts with the touch as being a set of "pixels" that go around the skin and the value for each pixel is the instant pressure on the skin. So here is like what the touch stimuli look like:
Figure 1: The upper panel is the actual touch stimulus. Here there are just 10 pixels that go around the circumference of the body. We are going to transform this touch stimulus to a train of P cell spikes by both incorporating the receptive field and a temporal filter. In the second panel is the touch stimulus convolved with the P cell temporal filter. This is just a very stupid filter right now that is supposed to fit the P cell a little, but is nowhere near accurate. The final panel is the actual currents that will be injected into each P cell (note that there are 4 rows for each P cell).
Figure 2: Here the P cells are just single compartment IF neurons, and this is there responses to the touch stimulus shown in Figure 1.. The first panel is the voltages, with threshold of -58 mV and Reset of -62. The second panel is the actual spike times of the 4 P cells, and the final panel is the ISI for all 4 P cells. 
Figure 3: This is the P cell temporal filter. 

Tuesday, June 25, 2013

Multi-subject dictionary learning to segment an atlas of brain spontaneous activity

Varoquaux, G., Gramfort, A., Pedregosa, F., Michel, V., Thirion, B. (2012). Multi-subject dictionary learning to segment an atlas of brain spontaneous activity. Information Processing in Medical Imaging 6801.

Ok so a lot of the registration algorithms for fMRI might be relevant for my VSD data. This paper was recommended to me by Stephan Gerhard (from NS&B).

They take an approach that combines ICA with dictionary learning.

Start with a linear signal decomposition. Y is data time series (n timeponts x p voxels). V describes functional processes or structured measurement artifacts with k spatial latent variables (p x k). U describes the time series of the latent variables (n x k). Y = U * V

So they are setting up the problem to create a hierarchical generative model of the data, and they have an optimization procedure to find U and V.

The algorithms are written into the text as pseudo-code, varoquaux's website:, and maybe some related code:

Theres also a package NiPy that has a lot of neuro-imaging algorithms. There may be some relevant algorithms there for my data. And also:

The rest of the paper is validation, but there are some other good references. I need to dive into the code to start getting some real insights into the applicability of this to my data.

Friday, June 14, 2013

Arbitrary heteroclinic orbits

Yeah so after hearing about crawling a bunch of times up here in MBL it seems clear that heteroclinic like orbits are more closely related to how CPGs work. This is in contrast to the Shilnikov style of doing CPGs, as they rely on intrinsic dynamics and pacemaker neurons -- effectively making CPGs out of coupled oscillators. The heteroclinic orbits don't require coupled oscillators and have the nice properties of being able to hold up the period in certain places like real CPGs. Plus, if you looked at a single CPG neuron that was part of a heteroclinic orbit, then you wouldn't see any intrinsic oscillatory dynamics. This seems to be the case most of the time for neurons.

So the question is how to make arbitrary types of heteroclinic orbits with neural-like models. There would have to be some population that begins to turn on and starts off as an attractor through some kind of recurrent feedback. But once the population turned on enough it acts like a source and the dynamics flow towards the next population. Each population would be a fixed point, but as you approach from one direction its attractive, and then it sends you off in another direction. But how to make this type of system is the question.

Wednesday, June 12, 2013

Dan Gerschwin

He came and gave a talk today at MBL. It was about genetics -- well really the "transcriptome", as reading all of the RNA is apparently much easier than the genome or the proteome. It was very interesting and the analysis he was doing was quite remarkable.

Most people look at up or down regulation of RNAs and try and correlate that with phenotype. He was actually looking at how different RNAs were correlated together, and then relating that to phenotype. Basically based on the correlation of different RNA expression levels he could build a distance matrix and then do graph-based analysis. He was showing how different RNAs cluster into "modules" and that modules were organized in reasonable ways -- many of the modules were related to the cell types and other pretty obvious differences. He had some cool ways of clustering and dimensional analysis that could be relevant to thesis work.

Thursday, June 6, 2013

Neural pointers

So how does info get routed into working or hippocampal memory? Pointers! Working and hippocampal memory are not encoding the information about a memory, but rather pointing to PTV where the information is stored. These pointers are stored in the sequence of spikes during each theta cycle. When Frontal wants to remember something, then it burst activates the neurons which leads to some plasticity. This plasticity is bi-directional. The PTV neurons that are currently active reinforce together (so that the memory can be recalled more easily ala a hopfield net/local minima), and then reinforce with a hippocampal sequence. The hippocampal sequence also reinforces with all of PTV, so that when the hippocampal sequence gets activated all of the info in PTV will be reactivated as well.

Lets think about working memory a little more. I vaguely remember some talks/papers I've read about how working memory would be implemented by the brain. there is a region in frontal cortex where a slow oscillation similar to theta dominates. During this slow oscillation, about 7 neural pointers can be stored (in the human brain). Each pointer was reinforced with some frontal burst feedback, so that each of the pointers can reactivate the PTV that has all of the information. You can imagine how the working memory structure can just maintain the 7 pointers in its slow oscillation through some sort of recurrent activation and plasticity or through that STP mechanism. This is how the working memory is maintained over longer time-scales. Then the different pointers being stored in working memory can be reactivated by another frontal area (another burst signal or something), which brings back the information stored in PTV.

Hippocampus is basically the same, but the pointers are built over a slow timescale (during each theta cycle), and then a memory over a larger time-scale can be recalled more quickly by activating a sharp-wave ripple.

Wednesday, June 5, 2013

PTV and Frontal

So we are starting to build this picture about what the PTV (Parietal-Temproal-Visual) cortex is doing and fitting it in with the rest of the brain. What it seems is that PTV is encoding/representing/building the current and recent state of the world. When you open your eyes and see the world around you, recognize the objects, hear the sounds, the 3D structure, etc. all of this information is being represented in PTV. . There is a tremendous amount of information continually being encoded by this circuit, and this information is the immediate state of the mind you are experiencing.

Now, this is too much information to make sense of all at once. And this is where frontal cortex comes into play. Frontal cortex essentially feedsback to all of PTV (via layer 1). Frontal cortex enhances the activity in subsets of PTV and can route subsets of PTV into working or long-term memory. To encode a long term memory, only the parts of PTV that are strongly modulated by top-down attentional feedback make it. Essentially attentional feedback activates the neurons and gets them to burst. This doesnt really change the information anywhere in PTV, but enhances is it. The burst is a signal that carries the same info about the sensory environment but tells other structures to operate differently. So to work something into say hippocampal memory (I guess this is more like medium-term), frontal cortex burst synchronizes every region of the brain it wants to remember. The entire hierarchy can be encoded in the local cortical circuits, and the hippocampus plays the role as the pointer and relates the cortical information being saved with a time-place pointer in the hippocampus. When the memory is recalled, the hippocampus can be reactivated which then reactivates the cortical areas.

PTV is creating a hierarchical model of the world from all of the information from the senses. This model is actually non-temporal (ugh, confusing in this case im referring to temporal as in time) -- it has temporal structure, but it is time-invariant. Your phone stays your phone, etc. Frontal cortex is more in charge of time. Frontal cortex is building a hierarchy to predict the future at different temporal scales. Frontal cortex is pulling information about the current state of the world from temporal cortex and routing it through working and long-term memory. Frontal routes PTV into working memory, and can reactivate hippocampal memory to route into working memory to make better predictions about the future.

Monday, May 20, 2013

Tuesday, May 7, 2013

Tirin Moore

Just went to a talk by Tirin Moore from stanford. He studies visual attention and was doing physiology work in FEF and V4. He had a lot of interesting data, but it was kind of all over the place.

His model had neurons with D1 receptors in 2/3 of FEF, and these were the ones that fedback to V4. He showed through antidromic AP annihilation that it was mainly the persistent firing cells that fedback to V4. He showed that modulation of D1 and D2 receptors in FEF have the same effect of behavior -- biasing the saccades to the RFs in FEF (although it was D1 antagonist and D2 agonist, which he said didnt matter because they have "U" shape properties). D1 changes had effects, however, on the properties of V4 neurons -- typically looking like enhanced attention. D2 had no real effect on V4, but had the behavioral result. D2's seemed to be more involved in the motor part of the saccade, and D1 in the attentional part.

It seemed like a clear case of Modulatory type of feedback projections. The neuron's were enhanced by FEF feedback, but required FF stimulation. The fact that the feedback was from the persistent firing neurons made sense to me -- keeping the modulation up in preparation for the stimulus to appear. Although, he said that the selectivity was narrowed, which goes against the kind of gain control modulation that I think of. However, this may just be because you're altering FEF in a non-sensical fashion, and that causes selectivity changes. And he was measuring V4 selectivity with oriented bars.

Thursday, April 25, 2013

PCA ICA course

I'm also finally done teaching the PCA/ICA course in the student comp class. I really like how serious the class is, I guess classes are a lot better when you have all the best kids and their motivated -- and also you aren't really responsible. I really enjoyed it, because it forced me to learn ICA pretty hard. And now I feel a lot more confident about it, and i will seriously be able to do some stuff with my data with ICA.

I like a lot of the ideas in the homework, but it definitely can be better. Honestly it is pretty easy right now, but some parts are confusing. But I really enjoyed people's answers and i thought that it was tricky in the right ways. It made me happy to see their answers. 

The ICA part was hard because I was a jerk and did it in python, and the mac kids had problems. But, I think its a really good exercise and theres a lot more to explore. I had them just add noise and see what happened, and i made algorithms to match up the ICs and print out the cost. i wanted them to come up with some more things, but nothing too fancy. They could also change the mixing matrix such that they had more or less microphones to actually see what happens.

I realize now that I had no idea how to use fastica for the image analysis. The problem was that it is not like PCA at all, and I just assumed that it was. It is interesting because the ICs are scalar-invariant. So they come out in no particular order and there's nothing that says how important they are -- there are no eigenvalues. I don't quite understand why they don't get sorted by whatever it is they're optimizing -- their non-gaussianity. Fastica is supposedly doing some optimization, so why don't each of the components have an optimization value, can't you sort by that?

The other useful thing would be to find the mixing matrix. It wouldn't really be that hard -- its basically fitting the microphones with the ICs, but maybe there are better and worse ways... I just did a linear regression to see which ICs matched up with the original components. I guess if you just matched them to the data, wouldn't it come out to an exact mixing matrix? 

Yi Zuo

Hosted Yi Zuo this week. She was the seminar speaker and talked about how dendritic spines form during learning tasks. I took away a few extra things from her talk -- one was that the spines stopped forming as much once the behavior had plateaued, kinda like perceptron learning rule (once you get it right stop changing). The other was that spine growth was present shortly after training, but there was no spine elimination. But, this was only true if the imaging was done immediately after -- imaging a day after then there was increased spine elimination to compensate for the growth. It just has to be something to do with sleep.

I actually had a really good time talking with her and learning about her life. I was very worried because she was quite Chinese and I was afraid it was going to be hard to communicate. But she was very talkative, maybe even too talkative, but I was happy because not too many awkward silences. I couldn't believe how energetic she was in general. I was exhausted and I was doing nothing for most of the day, she had talked to so many people: Mayford, Jim Connor, Lunch (Jeremy Biane, Tom Gillespie, ...), Zheng, Scanziani, Komiyama, the talk, dinner (andy, sam, adam, jeremy, heidi, kathleen). And then jeremy gave her a laptop talk about his data. Damn she was a trooper.

Monday, April 22, 2013

Functional mapping of single spines in cortical neurons in vivo

Chen, X., Leischner, U., Rochefort, N.L., Nelken, I., Konnerth, A. (2011). Functional mapping of single spines in cortical neurons in vivo. Nature 475.

Two-photon (LOTUS) imaging of individual spines in auditory cortex. Present tone stimuli at different frequencies and volumes. Spines show a variety of tuning properties, but no clear clustering on dendrites is observed.

Figure 3 | Frequency tuning and heterogeneous distribution of individual
active spines. a,Upper panel: two-photonimage of a dendritic segment of a layer
2/3 neuron (average of 6,250 frames). Lower panel: calciumresponses (average of
five trials) from two spines (S) marked by red arrowheads in the upper panel,
during 11 pure tones (from 2kHz to 40 kHz at 0 dB attenuation). Two
neighbouring spines indicated by blue arrowheads did not respond to any of the
11pure tones.b,Upper panel: frequency tuning curve of thenarrowly tuned spine
S1 shown in a. Data points are the mean values of response amplitudes from
five trials. Lower panel: average tuning curve normalized to the highestamplitude
(n538 spines, 10 neurons). c,Upper panel: frequency tuning curve of thewidelytuned
spine S2 shown in a. Lower panel: average tuning curve, normalized to the
highest amplitude (n546 spines, 10 neurons). Error bars in b and c show s.e.m.
d, Distribution of frequency tuning widths (DFrequency) of pure-tone-activated
spines (n584 spines, 10 neurons). e, Heterogeneous distribution of pure-toneactivated
spines along dendrites. Cartoons of dendritic segments from four
neurons, with numbers indicating the effective frequencies for each active spine.
Narrowly tuned and widely tuned spines are indicated by red and blue dots,
respectively. The neurons correspond to, respectively, neuron 25, neuron 27,
neuron 29 and neuron 30 in Supplementary Table 1. f, Plot of the most effective
frequency of a given spine versus themost effective frequency of its nearest active
spine (seeMethods).Dots along the red line correspond to pairs of spines that had
the same most-effective frequency (n569 pairs, 24 dendrites, 10neurons).
g, Plot of the distance between neighbouring active spines versus the difference
between their respective most-effective frequencies. For each pair of spines, the
reference spine (blue circle) was defined as the left spine and the test spine (red
circle) was defined as the neighbouring active spine on the right. The
measurements were performed sequentially from left to right in each dendrite
(n551 pairs, 24 dendrites, 10 neurons). Dots along the red line correspond to
spine pairs that had the same most-effective frequency. Numbers on the right
indicate distance ranges between pairs of spines with a difference between their
most effective frequencies of 0–1, 1–2, 2–3 and 3–4 octaves.

Wednesday, April 17, 2013


The motivation behind developing ICA can be illustrated by a classic problem -- the "Cocktail Party" problem. The basic idea is that you are at a big party and you hear the voices of many different people talking. Your ears are picking up the sum of all of these people's voices and your brain has the task of separating out each of the voices. Each ear receives a slightly different signal -- say your left ear is closer to speaker 1, and the other ear is closer to speaker 2. The left ear would receive a stronger signal from speaker 1 and a weaker signal from speaker 2. Mathematically each ear is just receiving a weighted sum of all of the signals.
x = M*s
So x is what your ears pick up, s are the speakers at the party, and M describes how the signals from each of the speakers get mixed together to make the observed signal. This is the "Mixing Matrix".

The assumptions about the problem that we are trying to solve is that the signals that we observe are actually mixtures of signals that are independent -- the speakers talking at the party are not influencing each other. So how do we pull out the independent signals from the observed data in x? What exactly do we mean by independence?

There are some strict probabilistic definitions of independence -- namely p(a,b) = p(a)*p(b). These p's describe the full probability distributions of the signals, and independence means that the chance of observing a and b together is equal to the product of observing each of them independently. If this holds true across the entire probability distribution, then the signals are independent. However, this definition is almost useless for any realistic application, because it would require accurately estimating the complete distributions and the complete joint distribution of the inputs. 

Correlation is something that you could easily calculate practically, but no correlation does not necessarily imply independence. Independence, however, does imply that no correlation. A sine wave and a cosine wave plotted against each other would create a circle -- data in the shape of a circle would clearly be uncorrelated. However, these data are obviously not independent.

A big intuition about how to determine independence comes from the Central Limit Theorem. This is one of the most fundamental ideas in probability and statistics. The central limit theorem states that the summation of many independent random events will converge towards a gaussian. It doesn't matter what the probability distribution of the individual events are -- they can be uniform, binary etc, but adding many independent elements together will produce a Normal distribution. A nice example is rolling dice. If you roll one die, then there is a uniform probability that it will land on one of the numbers 1-6. If you roll two die, then the sum of the die no longer has a uniform probability -- the sum is most likely to be 7, and least likely to be 2 or 12. As you add more and more dice the probability distribution gets closer and closer to looking like a bell curve -- a Normal/Gaussian distribution. The Central Limit Theorem is why a Normal distribution is called Normal -- we see these distributions all over the place in the real world because independent events when summed together take this shape. 

For ICA we are assuming that several independent components are being mixed together to create our observed signals. The Central Limit Theorem states that if we mix independent components then we get something that is more Normal than the original components. Thus the goal of ICA is to find a projection of the mixed signals that is the LEAST Gaussian or the most non-gaussian. ICA thus becomes an optimization problem where we are trying to find a projection of the data that gives us the most non-gaussian components.

In order to solve the optimization problem, we must define what we mean by non-gaussian -- how do we measure non-gaussianity? There are several possible ways to estimate the gaussianity of data, and each have different trade-offs. Estimators can be optimal but require a lot of data or they can be efficient but biased. The field is not fully set on the "best" way of estimating non-gaussianity, but there are a few ways of doing it which stem from probability theory and information theory. In reality these different metrics are actually quite similar, and probability theory and information theory are actually mathematically linked -- several studies have shown how estimators derived from probability theory are actually identical to some estimators derived through information theory. There is in fact an underlying mathematical structure that links both fields.

One of the most prominent and conceptually simple ways of measuring non-gaussianity is to use the "Kurtosis". The kurtosis is related the 4th order moment of a probability distribution (the first is the mean, the second is the variance, the 3rd is known as skewness). The kurtosis is the peakiness of the distribution. There are infinitely more moments, and all are defined by raising the expected data to different powers (the nth moment is E(x^n), where E is the Expectation operator). Kurtosis is technically the "excess" of the 4th moment, and is usually compared to the Kurtosis of a Gaussian distribution. A Gaussian distribution has a 4th moment of 3-sigma^4 (where sigma is std). The kurtosis is defined relative to a Normal distribution with variance of 1, and so you just subtract out the kurtosis of a Normal distribution with the same variance as the data -- K = E(x^4) - 3. 

Monday, April 15, 2013


I'm teaching ICA in the comp-neuro student course and I'm studying up on it today. I'm just going to try and make an outline of the important points to cover. This tutorial is really good, and will serve as the basis.

Here are the main points I want to cover:

  1. Motivation
    1. The cocktail party problem
    2. Definition -- matrices and data
    3. Ambiguities
    4. Illustration
  2. What is independence?
    1. Central limit theorm
    2. Nongaussian is independent
    3. Estimators of nongaussianity
      1. Kurtosis
      2. Negentropy
      3. Others
  3. Preprocessing for ICA
    1. Centering
    2. Whitening
    3. PCA
  4. Fast ICA
  5. Neuroscience Applications
    1. EEG
    2. Imaging

Friday, April 12, 2013

Nonlinear Population Codes

Shamir, M., Sompolinsky, H. (2004) Nonlinear Population Codes. Neural Computation 16: 1105-1136.

Not exactly what I thought this was about. They are crafting a population code such that they can vary the noise correlations of the neurons, something they argue is biologically relevant. This sets up more information about the stimulus in the higher-order statistics of the firing rates -- i.e. there's more information if you look at the covariances than just at the mean.

So they make a 2-layer FF neural network that has a quadratic non-linearity in the hidden layer. Then they can apparently get the stimulus information much more close to optimal through the network. This quadratic non-linearity is important for conveying the information in the second-order statistics, and some how decorrelates the neural signals. This makes the standard linear readout in the output layer the same as the typically way of reading out population codes.

So maybe worth mentioning something like -- non-linearities in neuron's IO functions could be important for extracting information from the higher-order statistics of neuronal firing (Shamir & Sompolinksy, 2004) or matching the statistics of the input probability distribution (cite needed). I remember some paper, I think we read it in Tanya's course, that discussed the idea that a sigmoid non-linearity is informationally optimal if the input has some Gaussian distribution. The most probable events are in the middle of the Gaussian, which is the steepest part of the sigmoid, and the steeper the output the more information about the input is available.

Thursday, April 11, 2013

Noise and Shunting

Ulrich, 2003; (noise)
Dynamic clamp in L5 pyr somatosensory cortex. Shunting multiplies the voltage, but not the spiking. He adds a GABA blocker and shows that it has an additive effect.

Chance, Abbott, Reyes 2002; (noise)
Dynamic clamp of rat somatosensory cortex slices L5 pyr. Random incoming excitatory and inhibitory synaptic inputs, increasing the "rates" of the inputs causes more shunting and more variance in the injected current. They inject driving current on top of the noisy current, more noise means the gain is decreased. Shunting inhibition without noise leads to shift in the firing-rate curve. They make a model, but its not a simulation -- they have an analytic description of firing rate from noise. The gain effect in the model goes away for high firing rates. Background excitation and inhibition must be balanced.

Mitchell & Silver, 2003; (noise)
Dynamic clamp of cerebellar granule cells. If synaptic excitation is frequency dependent then you get the gain effect, step currents give additive effect. Their IF model gets step shunting inhibition and synaptic excitation, and shows a gain and offset effect. The gain effect requires that the variance of the excitation increases with the level of excitation.

Capaday 2002; (inh feedback or neuromodulators)
Two compartment HH-like conductance model of motorneuron (dendrite and soma). He concludes that the mixture of excitatory and inhibitory conductances do not matter for gain, its not synaptic currents per se, but ligand action on receptors capable of activating intrinsic conductances. Inhibitory feedback can also be used to produce gain (circuit mechanism).

Gabbiani & Knopfel 1994;
Granule cell model -- multi-compartment, but "compact" (which effectively makes it a single compartment, I guess they just have a very low intra-cellular resistance). Shunting inhibition is additive and not multiplicative. This is just Holt & Koch before they described it.

Doiron et al. 2000; (noise)
Complex conductance model of pyramidal cell in electric fish and LIF model. Subthreshold shunting is divisive, in spiking regime additive. They get gain but only if there is stochasticity in the inhibition, and only for low firing rates. The gain is only there if the variance of the inhibition increases with the mean conductance.

Abbott & Chance 2005;
"(It is important to note that, despite comments in the literature to the contrary, divisive inhibition
of neuronal responses cannot arise from so called shunting inhibition. As has been shown both theoretically (Gabbiani et al., 1994; Holt and Koch, 1997) and experimentally (Chance et al., 2002),"
Review based on Chance 2002. They both can suggest that the circuit is doing a driving and balancing act to produce the gain, but require changes in the noise variance.

Salinas & Abbot 1996; (circuit)
Recurrent network for gain control.

Hahnloser et al, 2000; (circuit)
Analog circuit, feedback recurrent excitation and inhibition sets up a gain modulation. Can change the inhibition and alter the gain.

Ayaz & Chance 2009; (noise, circuit)
Normalization pool and modulatory pool, normalization pool driven by the stimuli within RF, modulatory pool driven by stimuli in surround. Inhibtion increases proportionally to the sum of both pools. When the pools increase noisy synaptic input you get a gain effect.

Prescott & De Koninck 2003; (noise, dendritic saturation)
Similar to Chance 2002, need noise but also dendritic saturation apparently helps. Multi-compartmental model of L5 pyramidal cell. Many of the other papers talked about how the gain effect goes away for higher inputs, the saturation prevents that.

Burkitt, Meffin, Graydin 2003; (noise, circuit)
LIF model. Two regimes: linear - excitation and extra leak from background noise cancel out, non-linear - leakiness dominates resulting in diminished gain.