Coding prinicipals of edge detectors in V1 - Barlow: redundancy reduction. Each feature detector is supposed to be as "statistically independent" from others as possible.
Difference between "decorrelated" filters (i.e. PCA) and "indpendent" features (i.e. ICA). ICA is equivalent to redundancy reduction problem. There are many solutions to decorrelations problem - PCA is the orthogonal solution. With stationary image statistics (pictures, not videos) the PCA filters are global Fourier filters. ZCA is another decorrelation metric, but the "polar" opposite of PCA. PCA is order according to the amplitude spectrum, and ZCA is ordered according to the phase spectrum.
ICA has a stricter requirement - that the outputs are not just decorrelated, but also statistically independent. ICA is semi-local. ICA can't be directly calculated. Use algorithm called "infomax" - maximize by stochastic gradient ascent the joint entropy.
Independence vs. sparseness. There are slight differences, but both lead to similar results. Sparseness is defined by the kurtosis of the filters. High kurtosis means a peakier middle and bigger tails - a normal distribution has kurtosis of 3 (although some people subtract 3 from kurtosis and say normal is 0, but then you could have negative kurtosis).
Classical hebbian-style learning is correlation based, and these types of learning rules would amount the decorrelation style filter sets. Learning rules for independence should be studied.
No comments:
Post a Comment