A new technique compares the logic of a machine-learning model to that of a human, so that the user can see patterns in the model’s behavior.
In Machine Learning, Understanding Why A model makes certain decisions that are often just as important as whether those decisions are correct. For example, a machine-learning model could correctly predict that a skin lesion is cancerous, but could do so using an unrelated blip on the clinical picture.
While tools exist to help understand the logic of expert models, often these methods only provide insight on one decision at a time, and each must be evaluated manually. Models are typically trained using millions of data inputs, making it nearly impossible for a human to evaluate decisions well enough to identify patterns.

MIT researchers have developed a method that helps the user understand the logic of a machine-learning model, and how that logic compares to a human. Image credit: Christine Daniloff, MIT
Now, researchers at MIT and IBM Research have created a method that enables the user to collect, sort, and rank these individual explanations to rapidly analyze the behavior of machine-learning models. Their technique, called shared interest, involves quantitative metrics that compare how well a model’s logic matches a human’s.
Shared interest can help the user to more easily uncover relevant trends in model decision-making β for example, perhaps the model is often confused by distracting, irrelevant features, such as background objects in photos. Collecting these information can help the user quickly and quantitatively determine whether a model is reliable and ready to be deployed in a real-world situation.
“In developing a shared interest, we aim to be able to enhance this analysis process so that you can understand on a more global scale what the behavior of your model is,” says lead author Angie Bogust, a graduate student in the Visualization Group. Computer Science and Artificial Intelligence Laboratory (CSAIL).
Bogust co-wrote the paper with his mentor Arvind Satyanarayana, an assistant professor of computer science who leads the visualization group, as well as Benjamin Hoover and senior author Hendrik Strobelt from IBM Research. The paper will be presented at the conference on Human Factors in Computing Systems.
Boggust began working on the project during a summer internship at IBM under Strobelt’s mentorship. After returning to MIT, Bogust and Satyanarayana expanded on the project and continued collaborating with Strobelt and Hoover, who helped deploy case studies that showed how the technology could be used in practice.
Human-AI Alignment
Shared Interest takes advantage of popular techniques that show how a machine-learning model made a specific decision, known as salience methods. If the model is classifying images, the dominant methods highlight areas of an image that are important to the model when it made its decision. These regions are visualized as a type of heatmap, called a principal map, which is often overlaid on the original image. If the model classified the image as a dog, and the dog’s head is highlighted, it means that those pixels were important to the model when it decided that the image contained a dog.
Shared Interest works by comparing salient methods to ground-truth data. In an image dataset, ground truth data are typically human-generated annotations that surround the relevant parts of each image. In the previous example, the box in the picture would surround the entire dog. When evaluating an image classification model, the shared interest is comparing model-generated prominence data and human-generated ground truth data for the same image to see how well they align.
The technique uses a number of metrics to measure that alignment (or misalignment) and then ranks a particular decision into one of eight categories. The categories run the gamut from fully human-aligned (the model makes a correct prediction and the highlighted area in the key map is the same as the human-generated box) to fully divergent (the model makes an incorrect prediction and a also does not use the image (features found in man-made boxes).
βAt one end of the spectrum, your model made the decision for exactly the same reasons a human did, and at the other end of the spectrum, your model and the human are making this decision for completely different reasons. By setting a quantity for all images in the dataset, you can use that quantity to sort through them,” explains Bogust.
The technique works similarly with text-based data, where keywords are highlighted instead of image areas.
rapid analysis
The researchers used three case studies to show how a shared interest could be useful to both non-experts and machine-learning researchers.
In the first case study, they used shared interest in helping a dermatologist determine whether to rely on machine-learning models designed to help diagnose cancer from photographs of skin lesions. . The shared interest enabled the dermatologist to quickly see examples of the model’s true and false predictions. Ultimately, the dermatologist decided he could not rely on the model because it made too many predictions based on image artifacts rather than actual lesions.
“The value here is that using shared interest, we are able to see these patterns emerge in our model’s behavior. In about half an hour, the dermatologists were able to make a reassuring decision of whether to trust the model or not. No and whether or not to deploy it,β says Bogust.
In the second case study, he worked with a machine-learning researcher to show how shared interest can evaluate a particular saliency method by revealing previously unknown pitfalls in the model. His technique enabled the researcher to analyze thousands of right and wrong decisions in a fraction of the time required by typical manual methods.
In the third case study, they used shared interest to dive deeper into a specific image classification instance. By manipulating the ground-truth region of the image, they were able to perform what-if analysis to see which image features were most important for particular predictions.
The researchers were impressed by how well the shared interest in these case studies fared, but Bogust cautioned that the technique is only as good as the methods based on prominence. If those techniques have bias or are wrong, a shared interest will overcome those limits.
In the future, researchers want to apply the shared interest to different types of data, particularly tabular data that is used in medical records. They want to use a shared interest to help improve current affordability techniques. Bogust hopes this research inspires more work that seeks to quantify machine-learning model behavior in ways that are understandable to humans.
Written by Adam Zewe
Source: Massachusetts Institute of Technology