Shamir, M., Sompolinsky, H. (2004) Nonlinear Population Codes. Neural Computation 16: 1105-1136.
Not exactly what I thought this was about. They are crafting a population code such that they can vary the noise correlations of the neurons, something they argue is biologically relevant. This sets up more information about the stimulus in the higher-order statistics of the firing rates -- i.e. there's more information if you look at the covariances than just at the mean.
So they make a 2-layer FF neural network that has a quadratic non-linearity in the hidden layer. Then they can apparently get the stimulus information much more close to optimal through the network. This quadratic non-linearity is important for conveying the information in the second-order statistics, and some how decorrelates the neural signals. This makes the standard linear readout in the output layer the same as the typically way of reading out population codes.
So maybe worth mentioning something like -- non-linearities in neuron's IO functions could be important for extracting information from the higher-order statistics of neuronal firing (Shamir & Sompolinksy, 2004) or matching the statistics of the input probability distribution (cite needed). I remember some paper, I think we read it in Tanya's course, that discussed the idea that a sigmoid non-linearity is informationally optimal if the input has some Gaussian distribution. The most probable events are in the middle of the Gaussian, which is the steepest part of the sigmoid, and the steeper the output the more information about the input is available.