Is the human-machine identification criterion proposed by Turing 75 years ago relevant in the era of advanced chatbots?

Today's sophisticated computers began as calculating machines: a collection of computational units that work in coordination with each other according to pre-written instructions, performing complex calculations quickly and efficiently. In the mid-19th century, mathematician Ada Lovelace (Lovelace), who is considered the first programmer, Translated and wrote lists On the initial design of the “Analytical Engine,” which was a kind of mechanical computer. Lovelace argued that the capabilities of such a calculating machine would be limited to what we could tell it to produce, and that it would not be able to invent things on its own. The implementation of the Analytical Engine was not completed, in part because of the limited technology of the time and funding difficulties.
Early computers They were actually born during World War II. The young technology opened up ways to deal with new problems that until then had no practical solution due to the amount of calculations they required. Researchers and thinkers at the time estimated that the limits of the capabilities of these machines depended on the amount of memory or calculation units they had, and tried to understand the potential inherent in the innovation. The machines that replaced humans in complex computational operations raised questions about the parallelism between their operation and human thought.
In 1950, the mathematician and engineer estimated Claude Shannon (Shannon) the computational complexity of the game of chess and described how Computer can play PDF file In a game that was perceived as a complex mental challenge. In the same year, the mathematician and computer scientist published Alan Turing (Turing), who was one of the founders of artificial intelligence, wrote a seminal article called “Computing machines and intelligence” and wrote in it: “I believe that by the end of the century the use of concepts and the prevailing opinion will have changed to such an extent that we can talk about thinking machines without expecting resistance.” Turing missed only slightly when he predicted that we would see artificial intelligence as early as 2000. Today, despite the dramatic developments that have occurred in recent years in the performance of artificial intelligence, we have not yet come to terms with the connection between the concepts “machine” and “thinking.”
ENIAC, built in 1946, is considered the world's first general-purpose digital computer. The advent of computers opened the door to complex calculations that were previously inaccessible | US ARMY, Science photo library
Thoughts on thinking
For centuries, humans have regarded thinking as a faculty that distinguishes our species from other animals. In his 1637 work “Essay on Method,” the French philosopher and mathematician argued that Rene Descartes (Descartes) that although animals are similar to us in their structure, “none of the activities dependent on thought, which alone belong to us as humans, can be found in them” (Carmel Publishing, 2008. Translation: Iran Dorfman). Animals, on the other hand, he claimed, can be imitated in an indistinguishable manner by machines with artificial limbs and a designed external appearance. According to Descartes, these mechanical beings, like animals, differ from the human being in their ability to think.

In the 1950s, artificial systems were also used in the opposite direction – as a means of investigating the mechanisms of thought. Neurophysiologist Gray Walter sought to examine what complex behaviors could be created through a network of simple connections. He developed Artificial turtles, which were robots shaped like a turtle shell, and moved and navigated in space using light and touch sensors. These sensing capabilities allowed them to bypass objects that got in their way and reach an electric charging station when needed, similar to today's vacuum cleaner robots.
In the article where Walter introduced the turtles PDF file He described the behavior of his robots When they are in front of a mirror As a behavior that if animals were to express, we would attribute it to self-awareness. Although the wording was careful, even in him one could detect the tendency to compare machines to animals or humans.
Gray Walter's Turtles. Build artificial systems that simulate thinking to learn about the mechanisms of thinking
Shannon, the mathematician who was interested, among other things, in computers playing chess, developed Mouse-like robots who solved maze puzzles. Psychiatrist Ross Ashby, in his book “Designing the Brain,” devised a system that could Adapt to the environment Through action and the reception of feedback. When Walter described Ashby's machine in his article, he argued that although it was a man-made machine, it was impossible to know at any given moment exactly what state it was in without “killing” it and analyzing what he called its “nervous system.” Humans tend by nature to humanize, that is, to attribute human qualities to non-human creatures: whether animals or inanimate objects and machines. Hence, we also tend to attribute intelligence to them.

Oral exam
On the borderline between robots that imitate thought-requiring actions and computers whose computational abilities are developing rapidly, Turing opened his paper on computing machines and intelligence with the provocative question “Can machines think?” In a 1949 paper, British neurologist Geoffrey Jefferson suggested waiting to admit that a computer could think until it could. Write Sonnets Like ShakespearePDF fileAnd Lovelace before him also noted that the computer would not be creative. But Turing proposed a change of approach and asked: Can we distinguish between a computer and a thinking being?
As a solution, he proposed a test that he called “the imitation game,” which is now commonly called the “Turing test.” The test structure is simple: an examiner talks to two examinees, one human and the other machine, without knowing who is who. The examiner may ask both of them a series of questions of his choosing. He may, for example, ask, “Write me a sonnet.” The possible response: “I don’t know how. I’ve never gotten along with poetry” would be a reliable answer for both the human and the machine impersonating the human. At the end of the series of questions, the examiner is required to decide which of the examinees is the human and which is the machine. If the machine manages to fool the examiner and convince him that it is the human, it will pass the test. All communication in the test will be done by typing, since in this interface there is no substantial gap in capabilities between the human and the machine, compared to handwriting.
The test illustrated that from Turing's perspective, the internal structure of the system to which we would like to attribute the ability to think is unimportant – the only thing that matters is the test of the result. From this perspective, definitions of thinking are unimportant, which will certainly change in the future as we learn more about the subject. It is enough to focus on our ability to distinguish between a creature that we agree on as having the ability to think and a creature that we are not sure is endowed with this ability. “The notion that the simulation and the thing itself are one and the same is the great legacy of imitation games,” emphasizes Prof. Ehud Lam, philosopher and historian of biology, in a conversation with the Davidson Institute website.
Turing’s proposed diagnostic mechanism is based on a conversation in natural language. Descartes also mentioned a similar test. “Machines will never be able to use words or other signs and connect them together as we do to communicate our thoughts to others,” he wrote in his essay. He also explained that it is conceivable that a machine could respond to targeted input such as touch and warn us if we hurt it, but it “could not connect words in different ways so as to reply logically to everything that is said in its presence.”
The Turing test has become synonymous with a test of the ability to express thought, but its very reliance on language may prove to be a major weakness.
The film "War Games", 1983. Even in popular culture, a conversation with a computer raises questions about intelligence.
Falling into a trap
Let's skip ahead to today. Chat-GPT has been our long-time conversationalist for nearly two and a half years. We ask him knowledge questions in natural language rather than focusing on formulating queries. For search engines, turn to him for help Professional questions And we even use it when we need it. For medical advice OrEmotional support – although Not sure this is a good idea.. Bots are given priority over Google searches by us thanks to the detailed answers they give us, and the possibility of continuing the search with follow-up questions that unfold like a conversation. Bots are given the opportunity to answer questions that usually require a highly reliable professional answer, thanks to answers that are formulated in an empathetic manner, accessible service language, and a lot of patience.
It is no coincidence that the feeling of natural conversation, which is also what is supposed to serve as a test, is particularly appealing to us. This tendency was observed in the early days of conversations with machines, even when their artificiality was clearly evident. In 1966, the interactive software Eliza was developed at the Massachusetts Institute of Technology (MIT). The software was designed with a basic ability to recognize patterns in input, so that certain words written by the user would produce a pre-prepared response, from a collection of responses that simulate empathy and understanding. To input such as “I am depressed today,” the software would respond “Can you explain why you are depressed today?” or “I am sorry to hear that you are depressed.”
This empathy encouraged users to immerse themselves in the conversation and even give it personal information that they wouldn't share with everyone, even though the on-screen response was clearly not the result of real emotion. Joseph Weizenbaum, who created Eliza, said that his secretary became engrossed in a conversation with the software and asked him to leave the room so that the two of them could talk in private.
The phenomenon known as the “ELISA effect” is described as the tendency of people to recognize signs of emotional understanding within strings of words generated by a computer. In controlled trials In which humans spoke to scripted robots used for teaching and education, it was found that when such a robot emitted responses that ostensibly expressed that it recognized the emotions of its interlocutor, the participants tended to conclude that the robot really perceives their emotions and is able to influence them, compared to scripts in which the robot was programmed to respond with sentences that did not express a perception of human emotions. It should also be remembered that humans' tendency to humanize objects and other animals also leads to attributing emotions or empathy to these entities, and that voice or written communication disrupts judgment even more. Some people find this blurring useful.
The ELIZA software was designed to output a response based on input keywords from a pre-prepared response database.
Economic interest
OpenAI, Google, Microsoft, and the other companies that produce bots like the ones we've been talking to for two and a half years are companies Profit-basedJust two months after the launch of Chat-GPT was already offered. More advanced paid version, and the company has since moved to a monthly subscription business model where the basic, flawed versions are available for free to anyone, while the new and improved versions are protected behind a paywall. These companies have an economic and competitive interest in producing bots that will entice users to stay and continue paying.
If this bot needs to emit more pleasant sentences and respond with inclusive and flattering responses, that is what we will manipulate it to do. These bots are based on natural language models that are trained on huge amounts of text. In practice, when an input – any sentence – is received from the user, the model concatenates as a response word by word the logical continuation according to statistical estimates. Sometimes these models produce inconsistent and illogical outputs known as “Hallucinations”, and it seems that The developers compromise on the accuracy of the answers To provide a decisive response. Think for yourself which conversation you are more comfortable having – with a hesitant and insecure bot or with a decisive bot. The developers of these bots adjust the nature of the outputs to improve the user experience and keep them loyal, consistent, and paying. “Rhetoric that uses concepts of intelligence serves the interests of high-tech companies,” notes Lem.
Turing could not have predicted the memory capacity and computational power that would allow such models to scan significant chunks of the texts available on the Internet – which he could not have predicted either. It is estimated that the training of such a language model is done on a platform of billions or even trillions of words. The texts used for training come from all over the Internet, that is, from diverse writers, from different cultures, and from various historical periods. Based on such collections, they learn statistics about reasonable word combinations.
Lam points out The eclectic collection that the models learn from gives rise to the “discursive dilemma paradox.” When the collection on which the model is based consists of different and diverse sources, there is no statistical means that would allow for drawing a uniform and consistent conclusion from it. Each speaker who contributes text to such a database has his own worldview and his own ideas, and it is impossible to draw conclusions from all of them together without running into contradictions. Such a system would, by its very structure, be a rather confused interlocutor.
You can find games online. that supposedly allow you to play the Turing test. These sites allow you to have an anonymous chat conversation and then try to decide whether it was with another user who logged in to the site to play or with an artificial chat bot. Such feedback can also be used to gather information that will help companies gauge the level of trustworthiness of their bots, and understand how to manipulate them to blur the stamp of artificiality that is noticeable in a written conversation between them and real people.
Chat-GPT itself occasionally emits two possible answers to the same question and asks users to rate which answer was more helpful to them. In this way, we too are actively participating in the biasing of verbal output, little by little, towards our comfort zone, and our comfort zone is human.
From my personal experience, I can say that when I use such chats, I can express myself politely and with the mannerisms that are typical of a conversation with humans, or vice versa – express myself impatiently when I don't get what I want. The very conversation arouses in us the tendency to express ourselves as if we are facing a thinking and understanding creature. As humans, we tend to be drawn to what we perceive as an intelligent creature. This does not mean that it is truly intelligent, but the tendency to perceive it as such relaxes our judgment.

Get into the head of the machine
The neural network models on which conversational chats are based are vast and complex. Difficult, and sometimes impossible Explain why the network prefers to emit one output over another. Even the computer scientists and developers of these systems cannot explain exactly what the network processes and whether it represents the data in a way similar to the concepts we humans use. The parallel between artificial neural networks and the brain is far from complete, but there is considerable overlap between the research approaches, and it is possible that studying one system will help us better understand the other.
The Turing test continues to resonate today, in part because it uses a diagnostic method that seems convenient to us – written language. This convenience, as mentioned, compromises our objectivity. There are many more questions that can be asked about intelligence and the capabilities of new technologies, which continue to advance at a dizzying pace. “We focus too much on the interface and linguistic representation,” argues Lam. The test, proposed 75 years ago, is a good starting point for an important discussion, but as Turing himself noted, things are constantly changing and evolving. The test also deserves revision.
More of the topic in Hayadan:
2 תגובות
So what's the answer to the question? Pass or fail? And what about the CAPTCHA test?
AI could be used to add non-trivial content