E. Paxon Frady: Schmidhuber

Looking at some of Schmidhuber's papers. Here's a few:

Ciresan, D., Meier, U., Masci, J., Schmidhuber, J. (2012). Multi-column deep neural network for traffic sign classification. Neural Networks.

Ciresan, D.C., Meier, U., Gambardella, L., Schmidhuber, J. (2012). Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation.

Schmidhuber, J. (2009). Ultimate Cognition a la Godel. Cogn Comput.

Here's the first author, Ciresan, website: http://www.idsia.ch/~ciresan/. Theres some interesting things here.

Going to go through the first one. This won the final phase of hte "German traffic sign recognition benchmark" with better-than-human recognition rate of 99.46%.

The Deep Neural Network (DNN) is hierarchical neural network that alternates convolution (i.e. receptive fields/simple cells) with max-pooling (i.e. complex cells). They cite this paper in reference to the implementation of their machine -- takes days instead of months to learn the signs with GPU acceleration.

Here's the basic architecture:

The convolution layer is basically just a receptive field for each neuron -- the same receptive field shape is spread out over all the pixel space. Just weighted sum, bias and non-linear activation function. Next is a max-pooling layer, which outputs the maximum activation over non-overlapping rectangular regions. The last layers are fully connected and form 1D feature vector. Softmax such that the output is the probability of the image belonging to a particular class.

Each instance of the training set is distorted by affine transforms to get more samples: translation, rotation, scaling. This helps prevent overfitting and adds in wanted invariances.

The output is essentially the average of several DNN columns. Various columns are trained on the same inputs, or inputs preprocessed in different ways. Preprocessing does contrast normalization, histeq, adaptive histeq, and imadjust.

We use a system with a Core i7–950 (3.33 GHz), 24 GB DDR3, and four graphics cards of type GTX 580.

Looking at his other stuff, he has some good ideas. One big one is "Flat Minima Search" http://www.idsia.ch/~juergen/fm/, where the idea is that you are looking for a "flat" local minima, where the error remains approximately constant with nearby weight parameters. This adds a simplicity bias and thus results in better generalizations. He also talks about this idea of detectors being independent, and basically that you have predictors which try and guess what a detector does based on what everything else is doing, and then the detectors try and do something different than everybody else.

E. Paxon Frady

Pages

Friday, July 19, 2013

Schmidhuber

No comments:

Post a Comment