The present – deep learning for computer vision

Nowadays, deep neural networks (also called Deep Learning algorithms) have mostly replaced feature engineering. With Deep Learning, the computer does the feature engineering for you by finding patterns in the images themselves.

Convolutional neural networks in particular are used for anything to do with visual data, such as image classification or image regression. They're derived from the Neocognitron, developed by Professor Fukushima in 1980—a kind of neural network that's tolerant to shifts and deformation. For example, if you were to write the letter 6 a few pixels to the left or right, it's still a 6.

In a convolutional neural network, there are convolutional layers that are stacked one on top of another. The input to the first layer is, of course, the input image. The layer will take several learned features and run them as filters on the image. These filtered versions of the image will be fed into the next layer, which will use features to further filter the image, and so on and so forth.

Of course, it's a bit more complex than that; there are other layers, such as pooling (reducing the size of the images) and activation layers (to regularize the data).

However, since the neural network learns the filters on its own, it's able to do some neat things; for example, when you train a neural network to classify, say, faces, it'll learn the following features:

The first layer will recognize very simple patterns, such as edges and lines.
The next few layers will recognize slightly more complex, but still simple, patterns such as curves and simple shapes.
The next few layers will learn more abstract patterns, such as combining those curves and shapes to reconstruct eyes, mouths, and noses.
The last few layers will learn how to combine those abstract patterns into full faces.
A feed-forward neural network will be used to classify the final features extracted by the convolutional neural network.

This is just an example—since the computer is learning on its own, it'll figure out what each layer learns. However, it does convey this: with every layer, the computer learns continuously more advanced and abstract patterns.

This opens up an interesting possibility: if you were to train a convolutional neural network to recognize cats and dogs, could you use the features it has already learned in order to train another neural network to recognize, say, lions and cheetahs? That's what transfer learning aims to achieve.

With transfer learning, since you're learning from a pre-trained neural network, you won't need as much data to train the neural network. Simple patterns have already been defined—you only need to fine-tune the more abstract ones.

This is one of the techniques used by Watson Visual Recognition in order to classify images based off of very little data.

Of course, there are other techniques used to help Watson, such as data augmentation, but we won't be getting deeper into that in this book.