A new proposal: to give artificial intelligence a license to practice medicine

One of the most respected medical journals in the world, the New England Journal of Medicine, published an article with an unusual proposal: to approve artificial intelligence in a way similar to the approval process that human doctors go through. Or in other words: to grant artificial intelligence a license to practice medicine After passing in his performance the performance of the specialized doctors

Soon, artificial intelligence will be able to practice medicine at a high level. But how do we know if they are safe or not? What will the legislator do, who should be entrusted with our safety? One of the most respected medical journals in the world published an article with an unusual proposal: to approve artificial intelligence in a way similar to the approval process that human doctors go through. Or in other words: to grant artificial intelligence a license to practice medicine.

Let's start with a discovery: the belief in Silicon Valley right now is that we will be able to reach "general artificial intelligence" in about four years. That is, we will develop an artificial intelligence that is capable of doing almost everything that humans are capable of doing. Even if the people of Silicon Valley are too optimistic - which is certainly possible - no one thinks that the progress of artificial intelligence will stop in the coming years. exactly the opposite. The general consensus is that it will continue to evolve and improve. It may not be able to do everything at a human level, but certain tasks, and certain professions, will be automated more quickly than others.

One of them may be the medical profession.

The Western world - and especially the United States - is currently experiencing a dramatic shortage of doctors. The shortage will only worsen. According to a report by the Association of American Medical Schools, by 2034 in the United States, the country will need almost 125,000 more doctors from the number expected to be in it. 

But the truth is even more bleak, because we always need doctors. We just don't understand it yet, because we've gotten used to living in an environment of scarcity. Medical advice is an extremely expensive commodity, as it requires the time and attention of a person who has studied for more than two decades to acquire his rare skill. Those people sit in clinics, clinics or hospitals, and the patients make a pilgrimage to them with a significant investment of time and money. 

In a world where artificial intelligence can take over the role of the doctor, each of us will be able to receive medical advice easily and quickly. We won't have to spend hours of our time to get to the doctor's office, but we can get his advice at home. And we will not have to be satisfied with one doctor, but we will be able to receive advice from entire committees of expert doctors - all silicon-based - who will debate together in the virtual world and reach a decision regarding our medical condition.

All this is well and good, but there is one big problem: the legislator is not going to agree to the entry of the silicon-based doctors so easily. 

And rightfully so. 


The deadliest doctor

John Shaw drove elderly people to the hospital and back in his small taxi in the 1990s. Over time, he noticed an unusual pattern: more than twenty elderly people who arrived at the hospital in good health, died suddenly and unexpectedly in the hospital itself. After several such unusual deaths, he identified the common factor: Doctor Harold Shipman He was the doctor who treated everyone. 

Shaw reported his suspicions to the police in August 1998, and Shipman was arrested shortly thereafter. After a long investigation and trial, the court determined that Shipman murdered at least fifteen patients who trusted him, and forged the will of one of them in order to that you leave him your supervisor. The murders were carried out using an overdose of drugs to avoid suspicion. The belief today is that throughout his extensive career, Shipman enough to kill 250 patients.

AI will be able to kill a much larger number.

The main factor that limited Shipman was time. A human doctor can treat a limited number of patients a day - no more than a few dozen, at most. A single AI engine, however, could provide medical advice to millions of people every day, and even billions. And if he makes certain mistakes over and over again, or suffers from a 'blind spot' regarding certain diseases, then he will be able to harm many people in a very short time.

No wonder, therefore, that lawmakers want to make sure that the artificial intelligences of the future - those that can provide medical advice directly to patients - will be extremely safe.

But how do you do it?

According to David Blumenthal, a physician at Harvard University, the solution is simple: Legislators should test AI as if they were doctors themselves.


doctors or tools

Blumenthal published an article in the middle of 2024 in one of the most respected medical journals in the world: New England Journal of Medicine. or at least in the derivative of this journal, which concentrates especially on artificial intelligence. Together with a colleague from Google, Blumenthal argued in an article that the regulatory rules currently in effect for artificial intelligence in medicine are simply not sufficient to deal with the new artificial intelligence. Specifically, the big language engines like ChatGPT, which can give medical advice at an impressive level - even if they still make mistakes, even serious ones.

Today, the legislator in the United States mainly concentrates on several types of artificial intelligence. The most basic are "assistance in clinical decision-making" systems. These are computerized systems that provide warnings to the doctor about the harmful way in which two drugs can affect each other, or recommend the most appropriate catheter diameter, and so on. Other 'simple' systems can provide statistical evaluations on very specific questions: for example, to estimate a patient's chances of developing a heart attack given a variety of information points about his health. 

These systems are developed and programmed by humans, and the algorithms behind them are simple and clear. According to Blumenthal's article, these systems do not even require approval from the legislator, because they are completely based on the medical books - and simply allow the doctor to find the recommendations from the relevant books automatically.

A more advanced type of artificial intelligence is based on "machine learning". In this case, it is the machine itself that goes through a huge collection of data items and identifies repeating patterns. Such systems can analyze x-rays, for example, or predict a patient's chance of developing diabetes - but it is not necessary to have a clear explanation behind its conclusions. The legislator approves the systems based on "machine learning", but limits them only to very specific analysis categories in which it has proven itself. Some software that was trained on x-rays from white men, for example, could get it completely wrong when it encountered images of white women.

In this case, the legislator approves the artificial intelligence based on "machine learning" once. But since it is clear that it can be improved by entering more information, the legislator allows the AI ​​developers to improve it from time to time, test the improved system, and then "lock" it for further changes until the next time they want to improve it. It's a long and complicated process, but it's worked well so far.

Then came the big language engines, and suddenly it is clear to everyone that the existing laws are not enough.


The intern that let me down

In April 2024, three Israeli researchers - Eran Cohen, Uriel Katz and Ido Wolf - published a study with particularly embarrassing results for many interns. The researchers ran the most advanced language engine at the time - GPT-4 - on the licensing tests passed by specialists in pediatrics, general surgery, gynecology, psychiatry and internal medicine. The engine's answers were compared to those of 849 parallel interns.

The result, as mentioned, was embarrassing for the interns. As explained by Dr. Cohen In an interview with Ynet

"We realized that not only does the GPT-4 chat manage to pass the test, but in some tests it received higher scores than the interns."

While the interns' scores ranged from 30 to 85, the AI ​​engine was consistent and almost never failed. In internal medicine and psychiatry in particular, he managed to get a higher score than most of the interns who took the test. 

The first particularly exciting point about the study, to me, is that the researchers did not examine different major language engines. That is, the engine was not adapted and trained for each of the various specialization tests. Exactly the opposite: the same engine reached these impressive results in all areas of specialization. The second point is that when the researchers looked at the 'father' of GPT-4, which is GPT3.5, they saw that it frequently failed the tests. How long did it take between the release of 3.5 to the market, and the appearance of GPT4? less than a year.

This is the pace of events today.

Industry experts estimate that the generation time of the major language engines - that is, the amount of time needed to achieve a significant improvement of these engines - is only eight months. The legislator does not know how to work and examine its successes in such time constants. And even if the legislator could do so, what would he test her for? About some sub-sub-ability, while being able to provide answers about… everything? While she bypasses the specialists and leaves them far behind - and soon also the specialist doctors themselves?

This is how Blumenthal arrived at his solution: to evaluate new artificial intelligences as if they were real doctors.


Licensing exams for artificial intelligence

In the article he published, Blumenthal suggests that we treat the new artificial intelligence not as medical devices, but as 

"A new type of clinical intelligence: that is, to regulate them less as if they were instruments, and more as if they were clinicians."

Blumenthal rightly points out that we have plenty of experience in assessing the abilities of doctors in medical schools and in hospital specialties. Doctors have to graduate from university, pass licensing exams, do a period of internship in the field, continue to train and gain experience and expertise, and agree that the quality of their care will be checked from time to time.

Blumenthal suggests that the legislature approve the major language engines after they have been tested in a number of ways that are suspiciously reminiscent of the training and examination pathway for medical specialists. One of the ways, for example, will include success in tests that will be based on the professional licensing tests. Another way would be a "specialty period": a period of time in which the artificial intelligence will be operated in clinical situations and provide advice, but senior expert doctors will be by its side to correct and correct it if it makes a mistake. And every time the artificial intelligence undergoes an upgrade - it will have to re-pass the theoretical tests and the internship period.

Last but not least, and most importantly, Blumenthal suggests that the results of all these tests and specializations be available to the public. Just as every doctor hangs his diplomas on the walls of the clinic, so will the artificial intelligence show the patients that "there is someone to trust".

This, then, is Blumenthal's solution to the licensing problem of artificial intelligence in the field of medicine. 

And I hate to say, but it also has holes.


The problems - and the success

Let's start on the positive side: I like Blumenthal's solution because it is forward-looking. The fact that one of the most respected journals in medicine already understands that "what was is not what will be", clarifies the magnitude of the change. The policy makers in the field of medicine are starting to put the evidence together, and understand that they need to prepare for a very different world than the one we have had up until now.

The meaning of such a big change is also that any solution proposed to deal with it will necessarily be bad. That's how things always happen. It is never possible to develop a policy in advance that will understand the full complexities of the technology, before it comes to fruition, and people start playing with it. The policy solutions will be bad - and even seem ridiculous and naive to us - at first. But they will get better.

Still, we can make an effort to produce smarter policy solutions right from the start.

The main problem I see with Blumenthal's solution is that the artificial intelligence cannot be "as good as a human doctor". Nor can she be "better than a human doctor". She should be "much better". Perfection may not be possible, but it should be truly excellent.

Why? Because the number of references it will reach will be huge. From the moment we can consult artificial intelligence with the ease of pressing a button or a video call, we will have discussions with it every day. And the hypochondriacs among us will do it every hour. We will ask for her advice on a wide variety of subjects - and in each and every one of them, there is a chance that she will fail. And if it fails, it may do so in unexpected ways, since it is not endowed with the same thought process as the human one. 

Here is also the second problem with Blumenthal's solution: how do you do a "specialization period", when the uses of artificial intelligence are going to be different from the areas of specialization we have today? 

A third problem concerns the time constants. Is it really possible to test artificial intelligence in a short enough time - say, in eight months or less? And if so, who will test it?

But I admit that these questions can be annoying. 

In the end, Blumenthal offers a possible solution, and he also surely understands that this is only the beginning of a thinking process about the future, and that it is impossible to reach a perfect solution at the beginning. His solution also recognizes for the first time that large language engines are so different from previous artificial intelligences that they need to be treated differently. In fact, our only way to assess their abilities is if we apply to them the same exacting requirements that we set for human doctors.

I believe that we need to think of a similar solution in every field: in accounting, law, psychology and everything else. It is not an exaggeration to think that in a few years, we will need to understand in each of these fields whether artificial intelligence is capable of doing work at a level equivalent to or better than that of human experts. And of course, to decide how to make these artificial intelligences accessible to the public.


The optimistic vision for medicine

I will end with optimism: we are moving towards a future of abundance. Blumenthal's article makes it clear that in the future, artificial intelligence in the field of medicine will have the capabilities of human doctors - and will probably surpass them sooner or later. It will be a future where each of us will have the lifestyle that only the richest enjoy today: a personal doctor for every person, at any time. And not just one doctor, but a whole committee of expert doctors. All the time. They will take care of us, detect medical problems years before they would normally be detected, and will be able to make sure that we receive the most successful treatment, as early as possible.

And maybe the way to know that they really know what they are talking about, will be exactly as we examine our medical students today.

So we wish success to the computerized doctors of the future - they are going to help us all. At least once they get their license.

More of the topic in Hayadan: