Comprehensive coverage

The learning machines

After decades of disappointment, artificial intelligence is finally starting to realize the vision promised at the beginning, thanks to a powerful technique called deep learning.

artificial intelligence. Source: wikimedia / Gengiskanhg.
artificial intelligence. source: wikimedia / Gengiskanhg.

By Joshua BengioThe article is published with the approval of Scientific American Israel and the Ort Israel network 11.08.2016

  • Artificial intelligence became a serious field of research in the mid-50s. At the time, researchers expected to reach the level of human intelligence within the time span of an academic career.
  • These hopes were dashed when it became clear that the algorithms and computing power of the time were simply not up to the task. Some skeptics have even stated that the entire field is nothing more than a baseless conceit.
  • The field has come back to life in recent years, with software built in a similar fashion to neural networks in the brain showing that the old promise of artificial intelligence may be realized after all.
  • Deep learning - a technique that uses complex neural networks - can learn abstract concepts, and already today reach performance equivalent to that of humans in certain tasks.

In the 50s, computers began beating humans at checkers and proving mathematical theorems, causing great excitement. In the 60s, the hope grew that scientists would soon be able to imitate the human brain in hardware and software, and that "artificial intelligence" (AI) would be able to handle any task on the same level as humans. In 1967, Marvin Minsky of the Massachusetts Institute of Technology (who passed away in January 2016) declared that the challenge of artificial intelligence would be solved within a generation.

This optimism was, of course, ahead of its time. Software written to help doctors diagnose diseases, and computer networks built inspired by the human brain to recognize the content of images, did not live up to expectations. In those early years, the algorithms were too simple, and needed more information than was accessible at the time. The processing power of computers was also too modest to support the heavy calculations required to create anything approaching the complexity of human thought.

In the mid-2000s, the dream of machines with human-level intelligence almost disappeared from the scientific community. The term artificial intelligence itself left, it seemed, the realms of serious science. Scientists and writers have described the dashed hopes of the period between the 70s and the mid-2000s as "The winter of artificial intelligence".

But in just one decade, everything changed. Starting in 2005, the outlook for artificial intelligence has changed from end to end. It happened whenThe deep learning", an approach to building intelligent machines inspired by neuroscience, began to stand on its feet. In recent years, deep learning has become a tremendous and unique force that drives research in artificial intelligence, and leading companies in the field of information technology are investing billions of dollars in its development.

The term deep learning describes the visualization of neural networks that gradually "learn" to recognize images, understand speech and even make decisions on their own. This technique relies on Artificial neural networks - a core component of research today in the field of artificial intelligence. Artificial neural networks do not precisely imitate the way biological nerve cells (neurons) work, but are based on general mathematical principles that allow them to learn, from examples, how to recognize people or objects in images, or to translate between the world's most common languages.

Deep learning technology has revolutionized artificial intelligence research and revived old hopes for computer vision, speech understanding, natural language processing and robotics. Its first products in the field of speech understanding appeared in 2012 in a component of the Google search system called Google Now. Soon after, applications appeared to recognize the content of images, a feature now integrated into the search engine of Google Photos.

Anyone who has experienced frustration from using clunky automated phone menus will appreciate the dramatic benefits of an enhanced smartphone personal assistant. Those of us who remember how poor the recognition of objects in photos was just a few years ago, when software mistakenly identified, for example, inanimate objects as animals, see how dizzying the progress in computer vision is: today there are computers that are able, under certain conditions, to recognize a cat, a stone or a person in an image to the same extent success of a human observer. In fact, artificial intelligence software has become part of the daily lives of millions of smartphone users. I myself hardly type any more messages - usually I just talk on the phone, and sometimes he also answers me.

These advantages have suddenly opened the door to further commercial exploitation of the technology, and the excitement is only growing. Companies compete with each other passionately for talented employees, and PhDs with specialization in deep learning are a rare commodity for which the demand is huge. Many professors who specialize in the field, and some claim that even most of them, were pulled from academia into industry, and received research facilities and generous financial incentives.

The work on the challenges of deep learning resulted in dizzying successes. As a neural network won the world's leading gogo player, Lee Se-Dol, this made headlines. There are already applications that tap into other areas of human expertise, not just games. They recently developed a deep learning algorithm that should diagnose heart failure based on magnetic resonance imaging (MRI), with the same level of accuracy as a cardiologist.

Intelligence, knowledge and learning

Why did artificial intelligence encounter so many obstacles in the previous decades? The reason is that most of our knowledge about the world around us is not structured in written language, as a collection of explicit tasks, as is required for the creation of a computer program of any kind. That is why we have not been able to program a computer to do, directly, many of the things that we humans do easily, such as understanding speech, images and language, or driving a car. Attempts to organize collections of facts into complex databases, to give the computer the appearance of intelligence, have met with only minimal success.

And this is where deep learning comes into play. It is part of a larger field called "machine learning" and is based on the principles of training intelligent computer systems to the level where the systems are able to teach themselves. One of these principles involves what a person or machine considers a "good" decision. In animals, the processes of evolution dictate making decisions that lead to a form of behavior that optimizes the chances of survival and reproduction. In human societies, a good decision may involve social interactions that enhance one's social status or sense of personal well-being. With machines, on the other hand, such as a self-driving car, the quality of decisions is measured by how well they match the decisions of competent human drivers.

In specific contexts, it is not always clear how to translate the knowledge needed to make a good decision into software code. A mouse, for example, knows its surroundings, and has an innate ability to sniff in the right place, move its legs, find food or mates, and avoid predators. No programmer will be able to create a sequence of sequential instructions that will produce these behaviors step by step. And yet the knowledge somehow lies in the rodent's mind.

Before they could create computers capable of training themselves, computer scientists had to answer basic questions, such as "How do humans acquire knowledge?" Some of our knowledge is innate, but most comes from experience. The things we know intuitively cannot be made into a clear sequence of steps that a computer can run, but they can often be learned through examples and practice. Since the 50s, researchers have searched for and tried to precisely articulate general principles that allow animals and humans, and even machines, for that matter, to acquire knowledge through experience. The purpose of machine learning is to establish processes - learning algorithms - that will allow the machine to learn from examples presented to it.

The science of machine learning is essentially experimental, because there is no universal learning algorithm that would allow a computer to learn well any task of any kind. Every algorithm for acquiring knowledge needs to be examined through specific tasks and data for a certain content world, and one is whether it is the identification of the sunset in an image or a translation from English to the Urdu language. There is no way to prove that a particular algorithm will be better than all others in any situation.

Artificial intelligence researchers formulated a formal mathematical description of this principle - a theorem known as: "There are no free meals" - which demonstrates that there is no algorithm capable of dealing with every learning situation in the real world. However, human behavior seems to disprove this theory. On the surface, at least, we have fairly general learning abilities in mind, which allow us to perform a wide variety of tasks that evolution did not train our ancestors to perform, such as playing chess, building bridges, or researching artificial intelligence.

These abilities imply that human intelligence makes use of general hypotheses about the world, and these may inspire the creation of machines with similar general intelligence. For this very reason, researchers developing artificial neural networks have adopted the brain as a raw model for designing intelligent systems.

The main computing units of the brain are nerve cells called neurons. Each neuron sends signals to other neurons through tiny spaces between the cells, called synaptic gaps. The tendency of a certain neuron to send a signal through this interval, and the strength of the signal it sends, are called "synaptic strength". When a neuron "learns", its synaptic strength increases, therefore, when it receives electrical stimulation, the chances of it sending signals to its neighbors increase.

Research in neuroscience has influenced the creation of artificial neural networks, which simulate the activity of neurons through software or hardware. The first researchers in this subfield of artificial intelligence, calledConnectionism”, hypothesized that neural networks could learn to perform complex tasks by gradually changing the connections between the neurons. These changes will result in the patterns of neural activity eventually representing the content of the input, which could be, for example, an image or a conversation segment. As the networks receive more samples, the learning processes will continue to change the synaptic strengths of the neurons and achieve a more accurate representation of the input - images of the sunset, for example.

Sunset lessons

The current generation of neural networks extends the pioneering developments of the connectionists. The networks gradually change the numerical values ​​of each synaptic connection, values ​​that represent the strength of the connection, and therefore the likelihood that the neuron will send a signal to another neuron. The deep learning algorithm changes these values ​​in minute changes every time the network "views" a new image. The values ​​slowly and steadily approach the situation where the neural network will be able to guess better what the content of the image is.

Today, learning algorithms require a lot of human involvement to achieve optimal results. Most of these algorithms work through supervised learning, where each example in the training phase is accompanied by a human-made label that defines the content of the example. For example, a picture of a sunset will be accompanied by the caption "hot sunset". In this case, the goal of the supervised learning algorithm is to receive an image as input, and to produce as output the main noun in the photograph. The mathematical process of turning an input into an output is called a "function". The numerical values, or synaptic strengths, that create this function are actually a solution to the learning task.

Learning the correct answers through memorization is an easy task, but quite worthless. We want to teach the algorithm what a sunset is, so that it can detect any sunset, in any image, even if the algorithm did not "see" this image during training. The ability to detect some sink - in other words, to generalize the learning beyond the specific examples - is the main goal of any machine learning algorithm. In fact, the training quality of any network is measured using examples that the network has not seen before. The difficulty in generalizing well the learning to new examples is due to the fact that there are almost infinite possible versions that can fit any category, for example the category "sunset".

"The great comeback of artificial intelligence, after a long hibernation, teaches us a lesson in the sociology of science and emphasizes the need to promote ideas that challenge the technological status quo."

In order to successfully generalize by looking at examples, the learning algorithm used in the deep learning method needs more than the examples themselves. It also relies on assumptions about the data, and assumptions about what is considered a possible solution to the problem. A typical hypothesis built into the software might be, for example, that if the input data that a certain function receives are similar data to each other, then its output shouldn't change much either: a change of a few pixels in an image of a cat, usually won't turn the cat into a dog.

One of the types of neural networks that combine hypotheses about images is called a neural network Convolutional, and it became a key technology in the rise of artificial intelligence. The convolutional neural networks used in deep learning include many layers of neurons, arranged in such a way that the output will be less sensitive to changes in the main bone of the image - for example, if its position changes slightly. A well-trained network will be able to recognize a certain person's face even if they are seen from different angles in different photographs. The structure of convolutional networks draws inspiration from the multi-layered structure of the visual cortex, the part of the brain that receives input from the eyes. The many layers of the virtual neurons in the convolutional neural network are what make it "deep", and more suitable for learning about the world around it.

to the depth of things

On a practical level, the developments that made deep learning possible resulted from certain innovations that appeared about ten years ago, when interest in artificial intelligence and neural networks was at its lowest point in decades. A Canadian organization funded by the government and with the help of private donors, the Canadian Institute for Advanced Research (CIFAR) his name, helped rekindle interest when he sponsored a research program led by Jeffrey Hinton from the University of Toronto. They also participated in the program Jan LeCun from New York University, Andrew Ng from Stanford University, Bruno Olshausen from the University of California at Berkeley, I (Joshua Bengio), and others. In those days, the negative attitude towards this direction of research made it difficult to publish articles, and it was difficult even to convince students to work in the field. Nevertheless, we were convinced that it was important to move forward.

The skepticism at that time regarding neural networks stemmed, in part, from the belief that training networks is futile, because of the challenges involved in optimizing the networks' behavior. Optimization is a branch of mathematics that tries to find a form of organization of a collection of parameters so that they achieve a mathematical goal. In the case of the networks, these parameters are the synaptic weights, and they represent the strength of the signal that passes from one neuron to another.

The goal is to produce predictions with a minimal number of errors. When the relationship between the parameters and the goal is a simple relationship - more precisely, when the goal is convex function of the parameters - you can adjust the parameters gradually. The tuning process will continue until the parameters get as close as possible to the values ​​that give the best result, which is known as the global minimum. This result means that the network's average error in its predictions will be as small as possible.

However, in general, it is not easy to train a neural network, and a process called non-convex optimization is required. This type of optimization poses a much greater challenge, and many researchers believed that it was impossible to meet at all. The learning algorithm may get stuck in what is called a local minimum, then it will not be able to reduce the prediction error by small changes of the parameters.

The myth of the difficulty of training neural networks due to local minima was disproved only in 2015. In our research we discovered that when the network is large enough, the local minimum problem is greatly reduced. In fact, most of the local minimum points correspond to knowledge levels very close to the optimal value of the global minimum.

The theoretical problem of optimization may be solvable, but in practice, attempts to build networks with more than two or three layers have often failed. Starting in 2005, the researches supported by CIFAR managed to achieve breakthroughs that also overcame these problems. In 2006 we were able to train deep networks using a technique that advances one layer at a time.

Then, in 2011, we discovered a better method to train even deeper networks—that is, with additional layers of virtual neurons—by changing the calculations performed by each of these processing units. This change made them act more like biological neurons. We also found that deliberately adding random noise to the signals transmitted between neurons during training (again, similar to what happens in the brain) improved the network's ability to learn to recognize an image or sound.

Two essential factors contributed to the success of deep learning techniques. The first is a tenfold increase in calculation speed, thanks to graphics processing units originally designed for video games. With their help, it was possible to train larger networks in reasonable periods of time. The second factor that promoted deep learning was the availability of huge collections of labeled data, with the help of which the learning algorithm is able to identify correct answers - for example, the answer "cat" when examining a photo in which a cat is only one of the photographed objects.

Another reason for the recent success of deep learning is its ability to learn to perform a series of calculations that build or analyze, step by step, an image, voice recording or other data. The deeper the network, the greater the number of these steps. Many of the visual or voice recognition tasks that artificial intelligence excels at today require a deep network with many layers. In fact, in recent theoretical and practical studies, we have shown that some of these mathematical operations cannot be performed efficiently at all without sufficiently deep networks.

Each layer in a deep neural network processes its input and produces an output that is sent to the next layer in line. The deeper the layer, the more abstract concepts the network represents, the more distant from the initial raw input [see figure]. Experiments have shown that artificial neurons in the deep layers of the network tend to connect to abstract semantic concepts, such as 'desktop'. The recognition of the desk in the image may appear as a result of processing by neurons in a deep layer, even if the concept 'desk' is not included in the category labels on which the network was trained. Also, the concept of the table itself may only be an intermediate step towards creating an even more abstract concept at a deeper layer, which may for example classify the image as 'office'.

Beyond pattern recognition

To date, artificial neural networks have been characterized mainly by their ability to recognize patterns in static images. However, there is another type of neural network whose influence is evident: a neural network that analyzes events that occur over time. Such networks, called "Repeat" networks, have shown the ability to perform a series of calculations without error, typically for analyzing speech, video, and other data. Serial data is made of units, such as phonemes or whole words, that come one after the other. The recurrent networks process such data similar to how the brain works: the signals that pass between the neurons are constantly changing following the processing of new data coming from the senses. This internal neuronal state changes according to the input that reaches the brain from the environment, before the brain launches a sequence of commands that cause a series of body movements designed to achieve a certain goal.

The recurrent networks are able to predict what the next word in a sentence will be, and this information can be used to create new sequences of words one after the other. The networks are also able to perform more sophisticated tasks: after "reading" all the words in the sentence, the network will be able to guess the meaning of the entire sentence. Another recurrent network can use the semantic processing of the first network to translate the sentence into another language.

Research in the field of recurrent neural networks experienced its own delays, between the late 90s and the early 2000s. My theoretical studies have shown that it would be difficult for these networks to learn how to retrieve information from the distant past, that is, the first elements of a sequence that has been processed: imagine that you are trying to accurately recite the words from the beginning of a book you have just finished reading. However, some recent developments have reduced this problem and allowed such networks to learn to store information so that it will be preserved over time. The neural network can use the computer's temporary memory to process multiple and separate pieces of information, such as ideas expressed in different sentences in a document.

The great comeback of deep neural networks, at the end of the long winter of artificial intelligence, is not just a technological victory. He also teaches us a lesson in the sociology of science, and especially emphasizes the need to support ideas that challenge the technological status quo, and to encourage diverse research, which can also promote fields that have lost their luster for a while.

This article is part of a special review on artificial intelligence by Scientific American Israel. see also:

Is there anything to fear from robots that are smarter than us?

The truth about driverless cars

good to know

Machine learning: Smart grids that get even smarter

The connections between neurons in the cerebral cortex have inspired the creation of algorithms that mimic these complex connections. It is possible to train an artificial neural network to recognize faces by, first of all, exposure to countless images. After the network "learns" how to distinguish faces in general (compared to hands, for example) and then to recognize specific faces, it uses this knowledge to recognize faces it has seen before, even if in the image they appear at a slightly different angle than the one it was trained on.

To recognize a face, the network begins by analyzing the individual pixels of an image presented to it in its input layer. In the next layer, she selects geometric shapes that characterize a specific face. Intermediate layers detect eyes, mouth and other features, before a higher layer detects the face as a whole. The output layer provides a "guess" - whether the face belongs to Yoel, Kobi or Lior.

About the writers

Joshua Bengio - Professor of computer science at the University of Montreal, and one of the pioneers of the development of deep learning methods, which led to the resurgence of the field of artificial intelligence.

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.