Comprehensive coverage

How to defend against malicious AI and maintain a happy married life

Experts want any "safe" artificial intelligence to be equipped with a "world model" in addition to its impressive brain. That is, in an internal theater that she could run experiments on to understand whether her actions could lead to harmful results

A conversation between past and future: coffee, technology and robots. The figure was prepared using DALEE
A conversation between past and future: coffee, technology and robots. The figure was prepared using DALEE

Several years ago I discovered the secret - or at least, one of the secrets - to a long and happy married life.

This happened after endless fights, arguments and mutual accusations regarding an age-old question: how much honey to spread on each slice of challah?

My wife, may she live long, likes a lot of honey on every slice. I - already sweet enough on the inside - like the exact opposite: very little honey. The first time I tried to pamper the lady and serve her toast with honey for bed on Saturday morning, I gave her too little. 

"More honey." she emphasized to me. "Put more. I love more."

Next time I put more. It still wasn't enough for her. Next time even more - and she still wasn't satisfied. And that's where I got stuck. Every new week when I made the toast, I tried to put more honey, but something stopped me. Something in my brain - somewhere between the frontal lobes and the ancient crocodile in me - intuitively refused to believe that I wasn't overdoing the honey. I would pour a certain amount, and I just couldn't believe I needed more.

My wife was frustrated. She didn't get all the honey she needed in bed. Our married life was on the rocks. I feared the end was near.

Then we found the solution.

"After you put a lot of honey on the slice," my wife taught me, "and think you've already put a lot, and are completely convinced of it - then you need to put more."

And so my wife found the way to bypass the analysis mechanisms of my brain, using a simple and direct logical rule. She understood that my brain works in a certain way, analyzing things as it has learned to do over years of applying honey, and interpreting each new situation according to my own experience and tastes. She set a simple condition: when I myself believe that I have reached the limit, then I have to move on. And I heard, internalized - and I managed to spoil her with a toast dripping with honey from all sides. Our married life was saved.

And the same logic may lead to the fact that we will also be able to save the world from artificial intelligence.


When the best brains gather advice

In the last year, some of the greatest and best minds in the field of artificial intelligence gathered. These are people like Joshua Bengio, Stuart Russell, Max Tagmark and Joshua Tanenbaum. In short, there is no world artificial intelligence conference that does not try to invite these people to give the opening lecture. They have been researching artificial intelligence and thinking about artificial intelligence for many years, and are widely respected in industry and academia. When they talk about artificial intelligence, others shut up and listen.

And now they decided to stand up and talk and share together their thoughts regarding the mechanism that will ensure the safety of artificial intelligence. Together with several other thinkers in the field, they wrote Article published on arXiv, in which they proposed the establishment of a new system of thought that would ensure that our artificial intelligence systems would be safe and reliable.

And not surprisingly, they scored big. I mean, my wife's.

But before we get into the intricacies of the system they proposed, we need to explain why they are so bothered by artificial intelligence.


Dangers everywhere

The experts understand that artificial intelligence is about to become smarter than humans. This statement does not mean much by itself. My calculator is also smarter than me in some aspects: it can perform mathematical calculations much faster than I can. But the new artificial intelligence, of the kind that GPT is the freshest representative of, is starting to get really "smart". That is, she can make decisions that are similar to those that humans would make in similar circumstances. And it is not limited to only one field of knowledge, but prevails over many different fields at the same time.

What's the problem with that? that although she can give incredibly smart answers, she doesn't necessarily have a morality of her own. Already today, I can ask GPT to explain to me how to synthesize nerve gas - and if I do it correctly, he will provide me with a detailed explanation, and even recommend where to spread it to reach the maximum number of losses. And before you respond that you tried these types of questions and encountered a refusal from the artificial intelligence, I want to remind you that you need to know how to talk to it correctly. For example, like I did here.

This is, of course, a big problem. Advanced artificial intelligence systems will appear in the coming years in all our infrastructures and devices. If the artificial intelligence does not understand that it has to refuse certain requests from humans, then in a few years any adolescent boy will be able to ask her to synthesize a nerve gas for him, or a cool new virus, and he will get what he wants in a short time. And that will be about the point where we can close humanity, turn off the light and say bye-bye to human civilization.

On a less dramatic note, we already know today that artificial intelligence can be used to generate large amounts of false information ("disinformation"). Or to set up bots that will argue with humans as if they were human, and try to change their political position. Or to invade the privacy of individuals, or to discriminate against certain minority groups - intentionally or not. 

And it's unpleasant to say, but we still don't know how to explain to them not to do it.

Why? 

No, this time not because of my wife. 

Because of Yudkowski's genie.


Parable of the genie

Eliezer Yodkovski is another one of the great minds in the field of artificial intelligence and in thinking about its dangers. It should be noted that it is also considered a noun in itself. I know at least one software woman who, every time Yudkovski's name is mentioned at the table, she clasps her hands together, bows her head and mutters "Yorum India". Yes, even atheists can be religious, when they encounter a being with a sufficiently great intellect.

About a decade ago, Yudkovski published an article in which he shared the Pari-Eto parable. Here he is in short, energetic, and with some improvisations on my part.

You, the wise readers, are trapped in a burning house. Lucky for you, you have a magic lamp. You rub the lamp, and Hope a genie emerges and is ready to grant you a wish. You, of course, immediately want to get out of the burning building. 

"No problem!" announces the genie, launching you a hundred meters up into the air. You did escape from the building. But now there's the little falling thing. 

Luckily for you, this is a pure-hearted genie, even if not particularly smart. He notices your plight, and turns back time. You are again in the burning building, again rubbing the lamp, again the genie, again a wish.

"Take me out," you tell Ginny, "but not up!"

"Immediately!" He says, and in a moment you find yourself a hundred meters to the left of the building, buried in the nearby hill. And again go back in time and try again.

"Get me out safe and sound!" You command the genie. It launches you out intact, but changes your brain chemistry so you always feel confident. You go down the road without looking to the sides and run over immediately.

"He got me out safe and sound, but in exactly the emotional and mental state I was in now!" You try again. And there you are, you escaped safe and sound - but the genie stucks you in a time loop where you experience the same emotions as during the fire, again and again and again.

Contrary to what you might think, the genie in Yudkowski's parable is not malicious or cunning. He simply does not understand how people think, what they want, what the meaning of life or death is. For him, the world only consists of chains of atoms, and not much more than that. He does not have a "world model": a kind of internal theater where he can run possible scenarios, and understand how his every action will affect the world before he performs it.

We, of course, have such an inner theater. If you don't believe me, just imagine leaving the house naked. We can understand and predict in advance how our choices will affect us and the world. We have a "world model" that takes shape throughout our childhood and adolescence, and that we continue to enrich throughout our lives.

And this is exactly what the experts want to provide to artificial intelligence.


The world model

The experts want any "safe" artificial intelligence to be equipped with a "world model" in addition to its impressive brain. That is, in an internal theater that she could run experiments on to understand whether her actions could lead to harmful results. Every time the artificial intelligence is asked a question, for example, it will formulate an answer and then run it through the "world model". If she understands from the "world model" that the answer could cause harm - then she will refuse to hand it over to the user.

What is that "world model"? For each artificial intelligence, there will be a different model. Artificial intelligences that are only responsible for boiling water in a kettle, need a very simple "world model": to understand how boiling water can affect the environment. An artificial intelligence like GPT, on the other hand, would require a much more complex "world model" that would describe how its answers could be used to spread false information, or provide information that would be used to manufacture weapons, or harm children, or any other harmful action.

If we go back to the honey-on-the-toast story, my brain is the original artificial intelligence, which is biased against the honey. She is the one who does the original calculation and arrives at the initial answers. And it is, as mentioned, biased. Even when I put too little honey, I can't get it. In order to achieve more successful results, I have to rely on another method of examination, which is the "world model", and a simple condition that comes after that examination: if I put too much honey, then now is the time to put even more. 

The diagram from the AI ​​researchers article on arXiv. Link.

Why should artificial intelligence not include such a "world model" automatically? Why shouldn't GPT be able to calculate for itself the consequences of its answers on the world? Well, not every AI can do everything effectively, and just because GPT is successful at providing convincing answers in the life sciences, for example, doesn't mean it can necessarily understand how those answers will be used for terrorist purposes. For this, we need another artificial intelligence that will consider the answers also from the perspective of a terrorism expert, with a "world model" that understands what terrorists want and how they act.

Where does such a "world model" affect us? Where will you see him in the future? The answer is that as end users, you probably won't be exposed to it at all. Most normal users of artificial intelligence will not be exposed to this additional protection step that will take place in a few fractions of a second, between the time when the artificial intelligence produces an answer for them, and until it shares the answer with them. In most cases, the protection mechanism will not affect the quality of the answer at all. But in some rare cases, the "world model" could prevent the AI ​​from sharing harmful information or taking harmful actions.


The transition to agents

The proposal of the artificial intelligence experts reveals another truth about the artificial intelligence of the future. Today they are used as "assistants" or "consultants" with limited abilities, who often rely on one prompt to provide an answer to a single question. In the not-too-distant future, we will move to agents: artificial intelligences to which the user can provide a simple instruction - and they will themselves activate a number of 'lower' artificial intelligences in the hierarchy, which will talk to each other, formulate answers, analyze them, insist on their correctness and also make sure that there is no obstacle to sharing them with the users. One of those sub-understandings will be the one that examines every decision of the agent against the "world model", to make sure that he will not do any harm.


When, where and how?

It is important to say that the proposal of the experts to enrich artificial intelligence in the "world model" is not a solution that is going to be implemented tomorrow morning. This is not really a solution, but more of an outline for a solution, which now needs to be thought about how to adapt it to every artificial intelligence and every situation. And yet, it's fascinating to see how the field is progressing, and topics like "artificial intelligence safety", which were previously reserved for the margins of computer science, are moving to the forefront.

It will not be easy to run such a "world model" for general artificial intelligences, which should be able to perform any task. In fact, it is possible that only equally complex systems will be able to monitor and control every interaction between the artificial intelligences and the users. Such control will exact a price in computing power and time, and it is not clear that all companies will agree to pay it. The countries will have to enforce strict safety rules on the subject. From this point of view, it is gratifying to see that the legislators in the world are trying (emphasis on "trying") to understand already today how to deal with artificial intelligence, so that we do not again find ourselves in a situation where innovative technologies such as social networks cause damage to society - and the legislators are forced to pick up the pieces and sweep up the dust after the fact.

Will the idea of ​​the "model of the world" be able to protect us perfectly even from artificial super-intelligences? Probably not. These will be able to find ways to corrupt the "world model" as well, perhaps by hacking and operating the low-level artificial intelligences that run the "world model". To deal with such super-intelligences, we will need more complex systems of control and control. But at least in the coming years, "world models" will probably be able to protect us from a variety of damages that may materialize thanks to focused and simpler artificial intelligences. And that's something too.

Let's hope, then, that the models of the world will be able to protect us in the near future. And that our slice of challah will always come with the optimal amount of honey for each of us.

Successfully!

More of the topic in Hayadan: