More accurate uncertainty estimates may help users decide when and how to use machine learning models in the real world

Since machine learning models can give wrong predictions, researchers often equip them with the ability to tell the user how confident they are of a particular decision. This is especially important in high-risk environments, such as when models are used to help identify diseases in medical images or screen job applications.
But model uncertainty estimates are only effective if they are accurate. If a model indicates that it is 49% certain that a medical image shows pleural fluid, then 49% of the time the model should be correct.
Researchers from MIT have introduced a new approach that can improve uncertainty estimates in machine learning models. Their method not only produces more accurate uncertainty estimates than other methods, but does so more efficiently.
Additionally, because the technique is scalable, it can be applied to large deep learning models that are increasingly used in healthcare and safety-critical situations.
This technique may provide end users, many of whom lack machine learning expertise, with better information with which to decide whether to trust the model's predictions or whether to deploy the model for a particular task.
"It's easy to see that these models perform well in scenarios where they excel, and to assume they'll be just as good in other scenarios. This makes work aimed at improving the uncertainty calibration of these models especially important, to make sure they match human perceptions of uncertainty, " says Nathan Ng, the paper's lead author, a graduate student at the University of Toronto who is visiting as a student at MIT.
Ng wrote the paper with Roger Gross, associate professor of computer science at the University of Toronto; And with senior author Marzia Jasmi, an associate professor in the Department of Electrical Engineering and Computer Science and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems. The research will be presented at the International Conference on Machine Learning.
Uncertainty Quantification
Uncertainty quantification methods often require complex statistical calculations that are not well suited to machine learning models with millions of parameters. These methods also require users to make assumptions about the model and the data used to train it.
MIT researchers took a different approach. They use what is known as the Minimum Description Length (MDL) principle, which does not require the assumptions that can compromise the accuracy of other methods. MDL is used to better quantify and calibrate uncertainty for test points that the model was asked to label.
The technique the researchers developed, called IF-COMP, makes MDL fast enough to use with the types of large deep learning models deployed in many realistic environments.
MDL includes examination of all possible labels that the model can give to a test point. If there are many alternative labels for that point that fit well, the model's confidence in its chosen label should decrease accordingly.
"One way to understand how confident the model is would be to give it reverse information and see how willing it is to believe you," Ng says.
For example, consider a model that says a medical image shows pleural fluid. If the researchers tell the model that this image shows edema, and it is willing to update its belief, then the model should be less confident in its original decision.
In MDL, if the model is confident when it labels a data point, it must use a very short code to describe that point. If he is unsure of his decision because the point may have many other labels, he will use a longer code to capture those possibilities.
The amount of code used to label a data point is known as stochastic data complexity. If researchers ask the model how willing it is to update its belief about a given data point in the face of contrary evidence, the stochastic complexity of the data should decrease if the model is confident.
But testing each data point using MDL would require a huge amount of calculations.
Speeding up the process
With IF-COMP, the researchers developed an approximation technique that can more accurately estimate the stochastic data complexity using a special function, known as an influence function. They also used a statistical technique called temperature calibration, which improves the calibration of the model results. The combination of impact functions and temperature calibration enables high quality approximations of the stochastic data complexity.
Ultimately, IF-COMP can effectively produce well-calibrated uncertainty estimates that reflect the true confidence of the model. The technique can also determine whether the model has mislabeled certain data points or reveal which data points are outliers.
The researchers tested their system on these tasks and found it to be faster and more accurate than other methods.
"It is very important to have some certainty that the model is well calibrated, and there is a growing need to find out when a certain prediction does not seem correct. Audit tools become more and more necessary in machine learning problems when we use large amounts of untested data to create models that will be applied to problems with humans," she says Jasmee.
IF-COMP is model-independent, so it can provide accurate uncertainty estimates for many types of machine learning models. This may allow it to be deployed in a wider variety of realistic environments, ultimately helping more professionals make better decisions.
"People need to understand that these systems are very vulnerable and can make things up on the fly. A model may seem very confident, but there are many things it is willing to believe in the face of contrary evidence," Ng says.
In the future, the researchers are interested in applying their approach to large language models and exploring additional potential uses for the minimum description length principle.
More of the topic in Hayadan: