Hayadan > Conversations with machines

Conversations with machines

Already today, programs talk to people and lead them to a restaurant, to the address they were looking for, or to an office meeting. But there is still a limit to what is heard, and a misunderstanding can end in embarrassing and even dangerous failure

Artificial brain: Wikipedia illustration. CC license (see link to the source at the bottom of the page)

Israel Benjamin Galileo

A mother and her little son arrive at the clinic, and are greeted by a computer screen. A female figure appears on the screen, representing a program that receives the following. The figure's face seems to direct its gaze to the mother and then to the child, and the lips move in coordination with the voice emanating from the loudspeakers: "Welcome. I am in touch with the best doctors in the world. Are you here for you or for the child?"

This is a demonstration developed by Microsoft's research group for the way in which medical centers may be conducted in the future (see here). It is worth noting that the software recognized that two people approached it together, the position of each person's head, and based on the differences in height decided that one of them was a child. To create a sense of interaction, the face changes its expression and direction according to what is being said.

The mother says "The child. He has diarrhea," and the character representing the software replies: "I'm sorry to hear that." She then turns her gaze to the child, finds out his name and age and verifies with the mother that the information has been absorbed well, then asks the child "Have you felt tired lately?" The software continues to ask more questions relevant to the complaint, among them: did stomach pain or fever appear? Has the child lost weight recently? The software uses words appropriate to the person to whom it is addressed. After she asks the mother: "Did he complain of stomach aches?" And if it is not sure, the software asks the child "Does your stomach hurt, honey?" Finally she tells her mother that there is no reason to worry, and makes an appointment with the doctor for the next day.

The assistant that can predict when a phone call will end

This demonstration was developed by the group of Eric Horvitz (Horvitz), a scientist at Microsoft, to illustrate the potential of interaction between humans and machines that look like humans and speak like them. Near the entrance to Horvitz's office is a similar software that functions as a kind of personal assistant: when a visitor arrives, the software can tell him if Horvitz is free, in a meeting or talking on the phone. It can also predict when the phone call will end, using, among other things, the history of other phone calls Horvitz has had with the same person, and Horvitz's activity during the current call. (Does he use the computer during it?)

The software may even start small talk with the visitor, reminding him when and in what context he last met with Hurwitz and asking if he watched the last hockey game of the city team. If the wait is long, she knows which conversations can be interrupted (Horwitz allowed the software to interrupt his conversations with executives at Microsoft...) and when Horwitz needs full concentration, for example when he is talking to partners in his research.

The woman who volunteered to contribute her face to this software works at Microsoft, and sometimes runs into people who recognize her from the screen next to Horvitz's office. She says it usually amuses her, but when the same face was also used for a trivia game set up in the building's cafeteria, she felt uncomfortable when her virtual character beat her in answering the questions presented…

Combines facial expressions and changes the tone and rate of speech

The software developed in Horwitz's lab demonstrates a combination of visual representation of talking faces with human language interaction. The face adds credibility and tangibility to the conversation, and also enables new content of non-verbal communication by controlling the computerized facial expressions and the direction of the gaze. Along with other advances in the field of non-verbal communication, which include drawing conclusions from the tone of voice and facial expressions of the human participants in the conversation, adding human mannerisms to the computerized speech (diversity of prosody - tone and rate of speech, a combination of non-verbal pronunciations such as "ah" and "hmm.." ), such developments bring conversations with machines to a high level of success, at least in certain uses.

Studies have shown that the more human features the software included, the better the message it conveyed was received. Recommendations on health issues were most effective when given by a robot physically present in the room, less so when given by a video of that robot, and even less so when given by a static photograph of the robot whose lips only moved.

Even without these advances, the interaction technology in question is already reaching impressive achievements. For several years now, this technology has allowed an airline customer to call and say to the computer operator: "I want two seats in economy class on a flight the day after tomorrow from Denver to Chicago", even if he chooses a different order of words, hesitates, repeats himself, etc. (if any word is not understood or that the client does not indicate an important detail, the software will guide him with its questions).

Today, the software does not require installation on large and fast computer servers: the new generation of the SYNC entertainment and communication system, which Ford installs in some of the cars it produces, is able to understand a wide range of commands: choosing an address by saying things like "the nearest Italian restaurant" or "street 14 Sixth Avenue, New York", followed by "Take me there"; selecting music that can be heard on the radio or retrieved from the content stored in the car by saying the name of a radio channel, or of a song (or artist, or record); and queries like "latest sports results" or "fuel prices". These capabilities are available in many languages – English (US and UK), French (European and Canadian), Spanish, Portuguese (European and Brazilian), German, Italian, Dutch and Chinese-Mandarin.

They even understand romance

Another example of software activated by voice queries is Siri, an application for iPhone devices. The Siri application does not decipher what is said by itself: it records the request and then sends it to the computer servers of Nuance (the company that developed the SYNC technology). These computers "translate" what is said into text that is sent to the Siri company's computer servers, and these perform the next step: an informed guess of the speaker's intent and carrying out his request.

If the request is "Reserve two places at a romantic restaurant within walking distance of my house for tomorrow at eight in the evening", the software will refer to websites dealing with listing and rating restaurants, filter the restaurants by location (Siri knows where the user lives and what is considered "walking distance"), and look for restaurants in their description or The word "romantic" appears in the reviews of the surfers. After that, she will check, for the restaurants that met these criteria, whether it is possible to make a reservation on the desired day and time, by contacting websites that provide a reservation service.

In this description you can see how the work is divided between computers and software, each of which is operated by a different company and in diverse business models, while making use of the fact that they are all available in a "cloud" of internet-based services.

Fragile software

Like people, the software described here is also capable of making mistakes. An article in the "New York Times" newspaper mentions a Siri user who asked to make a reservation at a certain Japanese restaurant, but Siri misunderstood the name of the restaurant and directed him to an escort service specializing in Asian girls (the user swore to the newspaper reporter that this was not his intention).

The consequences of misunderstanding can also be more severe, especially when the system reaches its limits. Let's imagine a person calling the airline's computer system and asking to transport cargo from New York to London: a medium-sized dog. If the software does not know that this is not a normal request, it may refer only to the size and weight of the package without referring to the environmental conditions required for the dog, laws regulating the introduction of animals to England, etc. It is possible that even a human official would not know what the required process is, but he would certainly know that this is an unusual case, which should be transferred to the care of an appropriate person at the airline. It is much more difficult for today's programs to know when they are crossing the limit of their ability and understanding.

The property of a system that works successfully within certain areas but fails without warning outside those limits is called "fragility" or "brittleness". This limitation becomes more dangerous the more we trust that system: when the computerized nature of the system is clear and prominent, as is the case in a telephone dialogue of the type "for orders press 1, for inquiries press 2", the human user develops low expectations and is careful not to exceed the limits of the software (he is of course not Knows exactly the limits, but will usually reduce his expectations far beyond what is required). The more natural and comfortable the interaction becomes, the easier it is to transfer the burden of understanding and caution to the system. In this respect, programs that understand and use tone changes and expressions of emotions in speech, facial expressions, etc. may cause the person standing in front of them to treat them as if they were also human.

These problems explain why the software that we mentioned at the beginning of things and that serves as a medical secretary is only a demonstration: what will happen if the software makes a mistake when it makes an appointment for an examination the next day, and does not recognize a condition that requires immediate treatment? If the software made a mistake and thus caused negative results, who is legally responsible for damages - the programmer? The doctor who provided the knowledge to the software? The doctor who installed the software in his office?

The more compelling and successful the software, the more significant its contribution may be. Unfortunately, the dangers associated with software failures may also be greater. To overcome such barriers, the software may need not only to use logic but also to think. This is of course the biggest and far-reaching challenge of the entire field of artificial intelligence.

Israel Binyamini works at ClickSoftware developing advanced optimization methods

The full article was published in Galileo magazine, October 2010

Link to image source