Can machines visually perceive the world around them?

Although primitive at the time, the sense of sight evolved in animals around 700,000,000 years ago. As humans, it's our primary sense, and around half of all of our mental processing power at any given time goes to visual perception—you may have realized this about your dreams, too!

While it may seem very simple, our sense of sight is actually a lot more complex than we thought! As humans, we gain a great understanding of the world around us through our sense of sight.

What's really interesting, however, is how we humans can recognize the objects around us. For example, if you were to look at a car, how does your brain know that you're looking at a car, and therefore create a conscious perception of a car?

It seems like a simple question, but the amount of learning and logic that goes into recognizing objects is truly remarkable. Even more noteworthy is that we can recognize objects with very little data, as well! Say you were shown a lawnmower and a snowblower for the first time in your life. If you were shown another snowblower, you'd instantly recognize it as a snowblower despite having only seen one other snowblower in your life.

Machines have, traditionally, had a very hard time understanding visual data—which is data presented in the form of images. This is due to the large number of ways a single entity can be represented. For example, you could take a picture of a golden retriever in your backyard or a husky in a parking lot, and both of the images represent a dog. However, for a computer, there isn't even any immediate discernible similarity in the pixels between those images that would tell you they both contain a dog—let alone the breed of said dog. With a computer, you can trace the path of individual photons, which is called ray-tracing, to render a near-real-life-quality scene in near-real-time, which is unfathomably computationally expensive. However, you can't tell whether the image you just took contains a cat, a dog, or a truck!

This is where Machine Learning comes in. Older techniques, such as feature engineering, have almost all been replaced by newer techniques, such as Deep Learning. All you'd need to do as a programmer is provide two things: preprocessed training data and a neural network architecture to train and re-train. Luckily, Watson takes care of the neural network part and the preprocessing for training data, which means that you just need to provide the raw training data.