The field of artificial intelligence has had a resurgence, as it has begun to assimilate what we know about how children learn
- How do small children know what they know? This question has occupied philosophers and psychologists for many years - and now also computer scientists.
- Experts in the field of artificial intelligence are studying the reasoning ability of toddlers to develop new ways to teach machines about the world.
- Two competing machine learning strategies, groping in an attempt to mimic what children do naturally, have begun to change the face of artificial intelligence as a field of knowledge.
If you spend a lot of time with children, you probably wonder how young humans are able to learn so much, so quickly. Philosophers have also wrestled with this question, from Plato to the present day, but they have never found a satisfactory answer. Auggie, my five-year-old grandson, has already learned about plants, animals and clocks, and of course dinosaurs and spaceships. He is also able to understand what other people want, and how they think and feel. He is able to use this knowledge to categorize the things he sees and hear and to make new predictions. For example, not long ago he stated that a kind ofTitanosaurus Recently discovered, and displayed in the American Museum of Natural History in New York, it is a herbivore - and hence it is not really that scary.
However, all that reaches Auggie from the environment is a stream of photons that hit his retina, and vibrations in the air that hit his eardrum. The neural computer behind his blue eyes somehow manages to start from the limited information of the senses and reach predictions about herbivorous titanosaurs. The eternal question is: Can computers do the same?
In the last fifteen years or so, computer scientists and psychologists have tried to find an answer. Children produce a great deal of knowledge from the little input that comes from teachers and parents. Despite the huge advances in artificial intelligence, even the most powerful computers are still unable to learn at the same level as five-year-old children.
It will take decades for computer scientists to understand exactly how children's brains work, and to create a digital version that will be able to operate as efficiently. In the meantime, they are starting to develop artificial intelligence, which combines some things we do know about how humans learn.
This side is up
At the end of the initial wave of enthusiasm, in the 50s and 60s of the 20th century, the attempt to create artificial intelligence stopped for decades. However, in recent years, incredible progress has been made, especially in the field of Machine learning, and artificial intelligence was one of the hottest developments in the fields of technology. At the same time many prophecies emerged, utopian or apocalyptic, as to the meaning of these achievements. They were presented as a prelude to eternal life or the end of the world, simply as they mean, and many words were written about these two possibilities.
I believe that progress in artificial intelligence creates such strong emotions because of our deep fear of the near-human. The idea that something could bridge the gap between the human and the artificial has troubled us since time immemorial, Man The golem of the Middle Ages, throughFrankenstein's monster, and Eva, the sexy fatal robot from the movie "X Machina".
But do computers really learn as well as humans? To what extent do the stormy discussions indicate a revolutionary change, or are they nothing more than empty talk? It's hard to follow the exact details of how computers learn to recognize, say, a cat, a spoken word, or a Japanese character, but on closer inspection, the basic ideas of machine learning aren't as mysterious as they might seem at first glance.
One approach tries to solve the problem by starting with the same stream of photons and air vibrations that Auggie, and the rest of us, perceive. These arrive at the computer as pixels in a digital image and as recorded sound samples. The computer tries to extract a series of patterns in the digital data, which can locate and identify entire objects in the world. This approach, calledFrom the bottom up", originates from the ideas of philosophers such as David Day וJohn Stewart Mill, and psychologists for example Ivan Pavlov וB. F. Skinner, among the rest.
In the 80s, scientists came up with an ingenious and attractive way to apply "bottom-up" methods so that computers could search for meaningful patterns in data. systems "קישוריות", or "neural networks", take inspiration from the way nerve cells convert the light patterns absorbed by our retina into representations of the world around us. A computerized neural network performs a similar operation: it uses interconnected processing units, similar to biological cells, to convert pixels in one layer of the network into increasingly abstract representations, such as a nose or an entire face, by processing the data at higher and higher layers.
The idea of neural networks has been revived recently, thanks to a new technique called "Deep learning", a technology that Google, Facebook and other tech giants have commercialized. The increasing power of computers, The exponential growthIn the computing resources described by "Moore's Law", also plays a role in the success of these systems. Another trend contributing to the rapid development is the formation of huge data collections. When processing capabilities are enhanced and there is more information to process, the connected systems are able to learn much more effectively than we might have previously thought.
Over the years, the AI community's preference for machine learning has oscillated between bottom-up solutions and alternative top-down approaches. "Top-down" approaches leverage what the system already knows to aid in learning new things. Plato, and other rationalist philosophers such as René Descartes, believed that learning is done "from the top down", and this greatly influenced artificial intelligence at its inception. In the first decade of the 21st century, such methods were also reborn in the form of probabilistic models, or Bayesian.
Similar to the way scientists work, "top-down" systems also begin learning by creating abstract and broad hypotheses about the world. The system predicts what the data will look like if its hypotheses are correct, and corrects them according to whether the predictions come true or fail.
Nigeria, viagra and spam
Bottom-up approaches are easier to understand, so let's start with them. Imagine trying to get your computer to separate important messages arriving in your email inbox from spam. You may notice that most spam messages have obvious characteristics: a long list of recipients, a source address in Nigeria or Bulgaria, references to multi-million dollar prizes, or perhaps a mention of Viagra. However, really useful messages can be similar, and you don't want to miss out on being promoted at work or winning an academic award.
If you compare a large enough number of spam messages to other types of e-mail, you might notice that only spam has certain identifying combinations of features: If the message came from Nigeria and also promises a million dollar reward, it's fake. There may also be patterns at a higher level and resolution that distinguish between spam and useful messages: for example spelling errors or IP addresses that are not obvious. If you can identify these patterns you can successfully filter the spam without fear of missing a real message that the Viagra you ordered has been sent to you.
"Bottom-up" machine learning is able to extract the relevant cues for performing such tasks. For this, the neural network has to go through its own learning phase. It examines millions of examples from vast databases, with each such example pre-labeled as spam or legitimate e-mail. The computer extracts a collection of identifying characteristics that separate the spam from everything else.
Similarly, the neural network may select images on the Internet labeled "cat," "house," "stegosaurus," and more. By extracting features common to each type of image, such as the pattern that differentiates all cats from all dogs, the network is able to recognize new images of cats, even if it has not been exposed to those particular images before.
The ability to teach artificial intelligence using large collections of data - millions of Instagram photos, e-mail messages or audio recordings - has led to solutions to problems that previously seemed incredibly difficult.
One of the "bottom-up" methods, called "Unsupervised learning", is still in its infancy but is able to identify patterns in data that have no labels. It simply looks for clusters of characteristics that identify some object - for example, noses and eyes are always associated with faces, distinguishing them from trees and mountains in the background. Identification of an object in such advanced deep learning networks is carried out through division of labor, where the task of identification is divided between different layers of the network.
An article in the journal Nature from 2015 showed how far "bottom-up" methods have come. researchers inDeepMindA company owned by Google, used a combination of two "bottom-up" approaches - deep learning andLearning with the help of reinforcements – in a way that allows the computer to play the Atari 2600 game console. In the beginning, the computer knew nothing about the games, and randomly guessed what the best moves would be, while receiving constant feedback on its performance. The deep learning helped the system recognize the characters on the screen, and the reinforcement learning rewarded it for breaking the previous scoring records. After a few games the computer reached a high level of success, and in some cases it played better than experienced human players. However, in other games, ones that humans learn just as easily, he fails completely.
The ability to teach artificial intelligence using large collections of data—millions of Instagram photos, e-mails, or audio recordings—has led to solutions to problems that previously seemed incredibly difficult, such as image recognition and speech recognition. Even so, it's worth remembering that my grandson has no trouble at all identifying an animal or responding to something said to him, based on much more limited data and training. Problems that a five-year-old can easily solve are still very challenging for computers, and much more complicated than learning to play chess.
Computers learning to recognize a furry face with a mustache often need millions of examples to categorize objects that we are able to classify based on just a few single examples. The computer, after serious training, may be able to recognize a picture of a cat it has never seen before, but the way it does it is quite different from human generalizations. The different way of thinking of the software causes errors. Certain images of cats would not be classified as such, while blurry spots, which would not confuse any human viewer, might be considered cats.
all the way down
The second approach to machine learning, which is changing the face of artificial intelligence in recent years, works in the opposite direction: from top to bottom. This approach assumes that we can absorb abstract information from tangible data because we already know a lot, and especially because the brain is already capable of understanding basic abstract concepts. Like scientists, we can use these concepts to formulate hypotheses about the world and predict what data (events) should look like if those hypotheses are true. This is the opposite of trying to extract patterns from the raw data, as is done in "bottom-up" artificial intelligence.
The best demonstration of this idea is looking at the spam epidemic, from the perspective of a real case I was involved in. I received an e-mail from the editor of a strangely named journal. He specifically referred to one of my articles and suggested that I write an article for his journal. No Nigeria, no Viagra, no millions of dollars - this message had none of the common characteristics of spam. But thanks to what I already knew, and an abstract thought about the spam production process, I was able to understand that the message was suspicious.
For starters, I knew that spammers try to get money out of people by appealing to human greed—and academics can covet publicity as ordinary people might covet a million-dollar prize or improved performance in the bedroom. I also knew that legitimate "open access" journals had begun to cover their costs by charging authors instead of subscribers. In addition, my article had nothing to do with the name of the journal. Based on all this information, I made a logical hypothesis, according to which the message was intended to entice academics to pay for the "publication" of an article in a fake journal. I could come to this conclusion based on just one example, and I could also check it by looking up information about the editor in a web search engine.
A computer scientist would call my thought process “Generative model", one that is able to represent abstract concepts, such as greed and being deceived. The same model can also describe the process used to generate a hypothesis: the thought that led to the conclusion that the message might be an email fraud attempt. The model allows me to explain how this form of spam works, and also imagine other types of spam, including ones I've never seen or heard of. When I receive the message from the journal, the model allows me to work from the end to the beginning and understand, step by step, why there is no doubt that it is spam.
Generative models were essential in the first wave of artificial intelligence and cognitive science in the 50s and 60s. However, they also have limitations. First, most of the patterns in the data can be explained, in principle, with the help of many different hypotheses. In my case, the message may have been legitimate, even if it doesn't seem likely. Therefore, generative models must include concepts of probability, and this is one of the most important recent developments in the field. Second, it is often not clear where the basic concepts that make up the models come from. Thinkers such as Descartes or Noam Chomsky claimed that we are born with them when they are already fixed in place, but are we really born with the knowledge of how greed and deception lead to fraud?
Bayesian models - an excellent example of a modern "top-down" approach - try to answer both questions. These models, named after the 18th century statistician and philosopher Thomas Bice, combine generative models with probability theory in a technique called "Bayesian inference". A probabilistic generative model can tell us how likely it is that we will see a certain pattern in the data if a certain hypothesis is true. If the e-mail message is a scam, it is likely to appeal to the greed of the readers. But of course, a message can appeal to greed even if it is not spam. A Bayesian model combines the knowledge we already have about potential hypotheses with the data in front of our eyes, so that we can calculate, with high precision, the likelihood that the message is legitimate or spam.
This "top-down" approach is more consistent with what we know about how children learn than a "bottom-up" approach. Therefore, for the past 15 years, my colleagues and I have used Bayesian models in our work on child development. Our labs and others have used these techniques to understand how children learn cause and effect relationships, to predict how and when young people will develop new beliefs about the world, and when they will change the beliefs they already have.
Bayesian models also serve as a great way to teach machines to learn like humans. Joshua B. Tanenbaum From the Massachusetts Institute of Technology (with whom I sometimes work), Brendan M. Lake from New York University and their colleagues published a study in 2015 in the journal Science, in which they designed an artificial intelligence system that was able to recognize unfamiliar handwritten characters, an easy task for humans but Especially hard for computers.
Think about your identification skills. Even if you've never seen a character in a Japanese scroll, you can probably tell if it's the same or different from a character in another scroll. You can also draw it, and even design a fake Japanese character, and understand, in addition, that it is different from characters in Korean or Russian. That's exactly what Tanenbaum and his team managed to get the software to do.
In a "bottom-up" approach, the computer receives thousands of samples and uses the patterns it finds in them to recognize new characters. In contrast, the Bayesian program gave the machine a general model of character drawing. For example, lines can stretch to the right or to the left. And after the software has finished one character, it moves to the next in line.
When the software saw a particular character, it was able to deduce the sequence of lines needed to create it, and then created a similar collection of lines itself. She did it in the same way that I deduced the sequence of steps that led to the creation of the spam message I received from the journal. Instead of considering whether the message is likely to originate from a marketing scam, Tanenbaum's model guesses whether the desired character is likely to originate from a particular sequence of lines. This top-down program worked much better than deep learning applied to the exact same data, and it faithfully reflects the performance of humans.
A perfect match
These two leading approaches to machine learning—bottom-up and top-down—have complementary advantages and disadvantages. In the "bottom-up" approach, the computer does not need to understand anything about cats in advance, but it does need a lot of data.
The Bayesian system can learn from few examples, and generalize better. However, this "top-down" approach requires considerable preparatory work to formulate the right set of hypotheses. Also, designers of both types of systems may encounter similar obstacles. Both work only on narrow and well-defined problems, such as recognizing written characters or cats, or playing atari.
Children should not face such constraints. Developmental psychologists have found that young children somehow manage to combine the advantages of both approaches, and take them much further. Ogi is able to learn from just one or two examples like a "top-down" system, but can also somehow extract new concepts from the data itself, like a "bottom-up" system. These concepts were not in his hands to begin with.
Augie can, in fact, do much more. He recognizes cats immediately, differentiates between letters, but also manages to reach creative and surprising new conclusions, far beyond his personal experience or general knowledge. Recently, he explained that if an adult wants to be a child again, he should avoid eating healthy vegetables, because they make children grow and become adults. We have almost no idea how such creative inference takes place.
When we hear claims that artificial intelligence is an existential threat, we must remember the still mysterious powers of the human mind. Artificial intelligence and machine learning may sound scary, and in some ways they are. Military personnel are investigating ways to use these systems to control weapons. But natural stupidity can be much more harmful than artificial intelligence, and we humans will have to be much smarter than we have been in the past to properly regulate the new technologies. Moore's Law is a powerful force: even if advances in computing come from quantitative advances in data and processing power, rather than from revolutions in our understanding of the brain, they may still have enormous, practical consequences. Nevertheless, there is no need to fear a new technological embodiment that is about to emerge into the world.